[00:00:04] I just logged out and went to log back in and it is telling me that I have attempted to log in too many times in the last hour [00:00:54] AaronSchulz: well... yes :) I can certainly arrange a window if that's what you prefer, but I don't see how it's different than any other deploy that you do [00:01:42] for timeline I'll just ping you, sometime in the morning (sf) [00:02:58] sure [00:04:01] so, I'm going to use http://www.mediawiki.org/wiki/Extension:DynamicSidebar and hide sidebar actions from users who don't have rights to use them [00:05:34] I'm going to expose openstack roles via mediawiki groups. if a user is a sysadmin in any project, they'll be added to the sysadmin group in mediawiki. same with netadmin. [00:09:26] it will be a relife to have math and timeline done [00:10:33] AaronSchulz: does your change for math also take care of writes to math.tmp? [00:11:35] PROBLEM - Puppet freshness on cp1022 is CRITICAL: Puppet has not run in the last 10 hours [00:20:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:37] Should Ben H. be removed from ? [00:24:13] I guess [00:24:17] His account is disabled [00:29:54] PROBLEM - Lucene on search1001 is CRITICAL: Connection refused [00:31:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.987 seconds [00:34:32] RECOVERY - Lucene on search1001 is OK: TCP OK - 0.027 second response time on port 8123 [00:37:32] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [00:37:32] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [00:37:32] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [00:37:32] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [00:37:32] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [00:37:33] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [00:37:33] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [00:37:34] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [00:37:34] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [00:39:20] apergos: math.tmp? [00:41:15] yeah, looks like a scratch area for the math extension [00:42:42] I don't see anything in the code about that [00:42:50] hm [00:42:56] ok, time to head out [00:43:03] /export/upload/math.tmp/ [00:45:39] New patchset: RobH; "removing ben from paging groups on nagios" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22687 [00:46:21] last update seems to be Aug 1 [00:46:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22687 [00:47:36] New patchset: RobH; "removing ben from paging groups on nagios" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22687 [00:48:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22687 [00:48:32] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22687 [00:59:40] Aug 19. sorry [01:00:07] with last directory mod Aug 22 [01:03:56] I would like to have a go at https://bugzilla.wikimedia.org/show_bug.cgi?id=31563 (make secure.wikimedia.org redirect to the correct place), but I'm not entirely sure what the secure.wm.o URL layout is. [01:04:56] There's the obvious /project/lang/ -> http://lang.project.org/ ones, but some special cases ruin it. [01:05:41] E.g. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/MediaWiki should not redirect to https://mediawiki.wikimedia.org/wiki/MediaWiki [01:05:55] !log enabling DynamicSidebar extensions on labsconsole [01:06:04] Logged the message, Master [01:06:20] Krenair: we have two changes in for this already [01:06:28] lemme find them for you [01:06:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:06:57] here's one: https://gerrit.wikimedia.org/r/#/c/13429/ [01:07:16] ah [01:07:19] maybe they merged them [01:07:21] that's good [01:10:20] Oh okay, thank you Ryan_Lane [01:10:35] yw [01:13:10] Yeah, I think Roan abandoned his in favour of paravoids [01:13:29] I think I took his and fixed it up [01:13:37] I think so too [01:13:39] :P [01:13:43] Yeah that's the one [01:13:45] Much less code! ;) [01:13:47] Author: Faidon, Committer: Catrope [01:14:02] shall we deploy it? [01:14:12] Yeah how about we deploy that [01:14:13] Now? ;) [01:14:50] * apergos looks at the clock. not going to monitor that one :-D [01:16:35] who cares if we break secure.wm.o? [01:16:44] Serves them right for not having moved already [01:17:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.823 seconds [01:18:04] Yeah I don't think you need to worry about breaking something that's been deprecated for... [01:19:03] About 11 months now: https://blog.wikimedia.org/2011/10/03/native-https-support-enabled-for-all-wikimedia-foundation-wikis/ [01:20:14] who is using prototype again? as in ...test.prototype.wikimedia.org [01:20:35] PROBLEM - Puppet freshness on cp1028 is CRITICAL: Puppet has not run in the last 10 hours [01:20:49] My browser blocked http://blog.wikimedia.org/wp-content/uploads/2011/05/2008_fundraiser_square_button-en.png from loading on that page, it should use a protocol-relative URL :) [01:21:33] The skin is on github, I think.. [01:22:30] Krenair: log a bug and basile can fix it and get ops to deploy it [01:23:13] will do [01:26:50] https://bugzilla.wikimedia.org/show_bug.cgi?id=39986 Who is Basile? [01:28:03] Guillaume [01:28:55] Guillaume Paumier, Technical Communications Manager [01:29:02] https://wikimediafoundation.org/wiki/User:Guillom [01:30:48] If he has that irc name, he's offline [01:31:26] On which note, it's 2.30am [01:32:22] He is in this room, yeah - that's him :) [01:36:15] ok, thank you RD. wasn't aware he also used that name [01:41:11] * apergos looks at RoanKattouw [01:41:33] test.prototype.wikimedia.org? [01:41:55] ? [01:43:31] well I need to find out who is using it [01:43:58] but really I need someone to turn off "FundraisingPortal" over there [01:43:58] I don't know [01:44:10] That I can do [01:44:14] Is it causing a problem? [01:44:22] well it wants to read from ms7 and [01:44:32] it has nothing to do with the fundraiser current code any more [01:44:43] so it would be another nail in solaris's coffin [01:46:33] Wait [01:46:39] * apergos waits [01:46:45] How the hell is test.prototype reading from NFS?! [01:47:03] GET /portal/wikipedia/en/fundraiserportal.js 200 (http://test.prototype.wikimedia.org/wiki/Main_Page) [01:47:07] like that :-P [01:48:21] Ah OK [01:48:25] I'll turn it off in a minute [01:48:31] excellent [01:49:44] I am now going to bail [01:50:05] there will be more cruft squashing tomorrow. [01:50:10] have a nice evening folks [02:02:50] :O [02:02:52] Found a swap file by the name "wmf-config/.CommonSettings.php.swp" [02:02:54] owned by: pdhanda dated: Wed Jun 15 21:35:15 2011 [02:02:57] (on the prototype VM) [02:17:20] !log nagios crashed, fixing [02:17:32] Logged the message, RobH [02:20:32] New patchset: RobH; "fixing some nagios errors from my earlier commit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22694 [02:21:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22694 [02:21:55] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22694 [02:29:27] hrmm, seems its been down longer than my change. [02:29:30] something else is fubar. [02:31:23] New patchset: RobH; "pager testing stanza needed for other nagios stuff" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22696 [02:32:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22696 [02:32:48] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22696 [02:33:53] PROBLEM - Puppet freshness on cp1023 is CRITICAL: Puppet has not run in the last 10 hours [02:33:53] PROBLEM - Puppet freshness on cp1027 is CRITICAL: Puppet has not run in the last 10 hours [02:33:53] PROBLEM - Puppet freshness on cp1026 is CRITICAL: Puppet has not run in the last 10 hours [02:33:53] PROBLEM - Puppet freshness on cp1024 is CRITICAL: Puppet has not run in the last 10 hours [02:33:53] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [02:39:08] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:40:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.088 seconds [02:41:23] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Wed Sep 5 02:41:03 UTC 2012 [02:44:59] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Wed Sep 5 02:44:35 UTC 2012 [02:46:29] RECOVERY - Varnish HTTP upload-frontend on cp1024 is OK: HTTP OK HTTP/1.1 200 OK - 641 bytes in 0.054 seconds [02:46:56] RECOVERY - Varnish traffic logger on cp1024 is OK: PROCS OK: 3 processes with command name varnishncsa [02:50:39] New patchset: Demon; "Minor typofix to Ia567fcbe: ended up renaming this to smtp_host all but here" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22697 [02:51:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22697 [02:54:27] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Wed Sep 5 02:54:05 UTC 2012 [02:54:35] RECOVERY - Varnish traffic logger on cp1026 is OK: PROCS OK: 3 processes with command name varnishncsa [02:55:47] RECOVERY - Varnish HTTP upload-frontend on cp1026 is OK: HTTP OK HTTP/1.1 200 OK - 641 bytes in 0.053 seconds [03:06:53] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Wed Sep 5 03:06:36 UTC 2012 [03:07:29] RECOVERY - Varnish HTTP upload-frontend on cp1023 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 0.054 seconds [03:07:47] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [03:07:57] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Wed Sep 5 03:07:40 UTC 2012 [04:07:21] New patchset: Jeremyb; "change all $ircecho_server to use the chat CNAME" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22698 [04:08:09] New patchset: Jeremyb; "make ircecho config sane (not just very long strings)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8344 [04:08:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22698 [04:08:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8344 [04:10:18] !log running d/d8 container consistency check fr ms7 from bast1001 in screen session as ariel [04:10:26] Logged the message, Master [04:29:06] New patchset: Jeremyb; "change all $ircecho_server to use the chat CNAME" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22698 [04:29:55] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22698 [04:34:25] New patchset: Jeremyb; "make ircecho config sane (not just very long strings)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8344 [04:35:14] New patchset: Jeremyb; "change all $ircecho_server to use the chat CNAME" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22698 [04:36:04] wow gerrit's slow [04:36:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8344 [04:36:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22698 [04:37:06] * jeremyb still needs to do some work on 8120, bbl [04:38:56] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [04:38:56] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [04:38:56] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [04:44:56] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [04:56:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.083 seconds [05:31:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:35:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.402 seconds [05:56:01] PROBLEM - Puppet freshness on cp1025 is CRITICAL: Puppet has not run in the last 10 hours [06:09:04] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [06:09:04] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [06:11:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:20:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.356 seconds [06:55:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:07:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.569 seconds [07:24:17] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [07:25:47] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 1.29 ms [07:31:34] morning [07:32:06] morning hashar [07:32:28] hey Tim!! :) [07:33:13] Jenkins is now running some extensions tests. I might setup Scribunto today :) [07:34:35] need to figure out a system to include the extensions it depends upon [07:35:17] I wrote a sampling profiler for lua [07:35:40] it seems almost too good to waste on lua, a few more lines of code and it would be a sampling profiler for PHP as well [07:35:48] is that something that will be build in a future lua version ? or dedicated to the ext you wrote? [07:36:10] it's in luasandbox, I will deploy it when I'm finished testing [07:39:07] did you get some volunteers to rewrote the heavy templates already ? ( {{cite}} comes to mind) [07:39:53] someone rewrote the citation templates, but the result was quite slow, which is why I wrote the profiler [07:40:13] it turns out to be about 70% template expansion, using frame:preprocess() [07:41:43] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:53:51] shall we deploy it? [07:54:01] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.061 seconds [07:54:17] (bad history recall, nevermind) [07:58:03] morning paravoid :) [07:58:34] hi [08:00:38] paravoid: I have been playing with fontconfig yesterday [08:01:08] the latex community use the Computer Modern font but Ubuntu lacks the font name they use [08:01:41] if you get any knowledge about fontconfig : https://gerrit.wikimedia.org/r/#/c/22533/ ;) [08:17:45] hashar: fc-match cmr10 matches cmr10.ttf from LyX (ttf-lyx) here [08:18:37] OHH [08:18:57] looks like that package provides all the fonts needed! [08:18:59] good find :) [08:19:01] there's also fonts-cmu (for precise) [08:19:25] that's Unicode & OT version of Computer Modern [08:20:23] OT = OpenType, a standard that (kind of) merges & supersedes TrueType and Type1 [08:21:19] fonts-cmu looks good [08:21:21] hm, how do you use cm-super with fontconfig? does that even work? [08:21:30] do you use cm-super-x11? [08:22:03] will try that one [08:22:10] fonts-cmu does not seem to provide cmr10 [08:22:58] cm-super-x11 neither [08:23:34] ttf-lyx's cmr10 is probably a raytraced version of the plain ol' CM [08:23:46] it probably doesn't have extended charsets and such [08:24:37] what do you need CM for? just math? [08:24:55] yeah for math expressions in SVG file [08:25:02] so librsvg can find them and [08:25:16] apparently the math guys uses cmr10 as a font name [08:25:35] ttf-lyx should suffice [08:27:03] trying out :) [08:27:10] will make the change very trivial [08:27:42] open("/usr/share/fonts/truetype/ttf-lyx/cmr10.ttf", O_RDONLY) = 3 [08:27:43] ;) [08:27:54] well, we knew that [08:27:58] check if the output's okay [08:28:34] particularly expressions with particularly rare characters [08:28:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:29:57] the bug opener created me a test file http://commons.wikimedia.org/wiki/File:Cm-super_test_font_rendering.svg :-D [08:35:24] Change abandoned: Hashar; "Apparently ttf-lyx provides the font we need :-)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22533 [08:35:58] :-) [08:37:47] I am not sure it supports unicode though :/ [08:38:15] New patchset: Hashar; "(bug 38299) Computer Modern fonts for SVG rendering" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22705 [08:38:26] it probably only supports latin1 + some extra characters as the original CM did [08:38:29] but do you need it? [08:38:48] I have no idea, I am not a math guy [08:39:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22705 [08:39:25] I am pretty sure we will have some hebrew / chinese math person having the issue :) [08:39:57] you don't have to go that far [08:40:04] Greek isn't in Latin1 :-) [08:40:13] PROBLEM - Puppet freshness on mw74 is CRITICAL: Puppet has not run in the last 10 hours [08:40:17] that's why I asked if it's just for math [08:40:24] if it's for math, it shouldn't be a problem [08:40:39] can't you write your math using Greek ? :-) [08:40:46] though alpha, beta etc are probably included [08:41:09] I guess that a lot of Greek characters are included, yes [08:41:11] π, μ etc. [08:42:18] There is no ツ in font cmr10! [08:42:22] no more Japanese smileys :( [08:43:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.026 seconds [08:44:21] paravoid: will wait for bug reporter to reply. Thanks for pointing ttf-lyx :-)) [08:45:49] fonts are a mess unfortunately [08:46:44] I'm maintaining a few myself, as part of the Debian Fonts Task Force [08:47:43] New review: Nikerabbit; "So many patchets?" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/22698 [08:53:18] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [09:00:21] PROBLEM - Puppet freshness on formey is CRITICAL: Puppet has not run in the last 10 hours [09:01:24] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [09:13:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:23:47] ahh [09:23:58] I have discovered ant has an API and support inline javascript :-] [09:24:22] so my postpone rewriting my ant build script using ruby rake [09:28:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.024 seconds [10:00:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:12:53] PROBLEM - Puppet freshness on cp1022 is CRITICAL: Puppet has not run in the last 10 hours [10:13:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.019 seconds [10:38:50] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [10:38:50] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [10:38:50] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [10:38:50] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [10:38:50] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [10:38:51] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [10:38:51] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [10:38:52] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [10:38:52] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [10:48:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:56:14] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [10:57:35] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [11:01:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.057 seconds [11:33:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:45:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.360 seconds [12:19:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:34:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [12:34:45] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [13:06:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:18:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.359 seconds [13:36:42] New review: Jeremyb; "yeah ;(" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/22698 [13:45:26] * jeremyb hopes cmjohnson didn't really stay the night ;P [13:46:05] ahhh [13:46:52] paravoid: easy change for ya: https://gerrit.wikimedia.org/r/#/c/22698/ irc.freenode.net -> chat.freenode.net :-) [13:47:05] jeremyb: did you get your irecho changes merged in ? [13:47:33] hashar: see the dependencies for that one you just linked ;) [13:47:51] hehe [13:48:22] TomDaley: what's with the nick? was misreading as TomDelay most of yesterday [13:48:22] hopefully Ryan will merge it today!!! [13:48:45] jeremyb: Joke from yesterday, but I'm thinking I'll stick with it for awhile. [13:49:01] TomDaley: is there a 1-line version? [13:49:29] apparently tom is DeLay (capitalization) [13:50:03] No, it's definitely not DeLay [13:50:05] It's Daley :) [13:50:54] bath ant project does not have an IRC channel :/ [13:50:56] One line: Ironholds was trying to bribe me to crash the site, so I told him he'd have to bring me Tom Daley. [13:51:45] bath? [13:51:56] TomDaley: who is that? [13:53:03] !w Tom Daley [13:53:03] http://en.wikipedia.org/wiki/Tom [13:53:04] [[w:Tom Daley (diver)]] -- I'll leave you to google images as to why :p [13:53:07] pff [13:53:08] hashar: no one's complained about them before. other people do the same thing too. I don't add them for every new patchset, I'm selective [13:53:12] !w "Tom Daley" [13:53:12] http://en.wikipedia.org/wiki/"Tom [13:53:14] .. [13:53:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:53:34] jeremyb: yeah feel free to ignore my comment :-) [13:53:54] Oh man, our free images for his article kind of suck. None of them do him justice at all. [13:53:56] :( [13:54:41] hashar: also, is there some way to add a comment for a push as part of the same action as the push? i haven't figured out hwo [13:54:44] TomDaley: well time to take a picture of yourself and upload to commons as {{self}} ;- [13:55:07] Hahahahahahaha [13:55:12] I don't think anyone will believe me [13:55:14] na need to push first then comment in Gerrit [13:55:41] jeremyb: You can do it from the command line, at least if you'd like. [13:56:03] once pushed though, you could use something like gerrit approve --comment "something there" [13:56:16] approve is deprecated, use review. [13:56:28] * TomDaley whacks hashar with the gerrit manual [13:56:31] TomDaley: i mean not as a separate comment. because pushing adds a comment itself, i want to put it in that comment [13:56:56] * hashar ducks by simply not opening the manual URL [13:57:28] * jeremyb gets the manual printed on a set of bricks and delivers to hashar [13:57:56] * hashar waits for delivery [13:58:34] out for a few mins [13:58:49] damn you'll miss the delivery [14:05:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.633 seconds [14:11:58] jeremyb: Is there a gerrit book yet? [14:12:02] If there is, there should be [14:12:08] no idea [14:12:20] TomDaley: is what i want possible? [14:12:37] Not as far as I know, no. [14:15:14] ;( [14:39:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:41] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [14:39:41] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [14:39:41] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [14:45:41] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [14:51:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.363 seconds [14:55:00] paravoid: i am expecting a new controller card for ms-be6 today. fedex usually doesn't arrive until 2(+/-) Local. Will you be around? [14:56:17] unlikely, maybe apergos? [15:01:08] !log search32 still borked ..going down for troubleshooting [15:01:16] Logged the message, Master [15:02:29] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 187 seconds [15:03:14] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 200 seconds [15:03:32] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [15:06:06] hi apergos [15:06:12] hello [15:06:19] do you have a minute to make a change to something on OTRS? [15:06:27] it's changing the news item on the main login page [15:06:51] on the OTRS server (williams) in /opt/otrs/Kernel/Output/HTML/Standard/Motd.dtl. - can you change that text to replace it with: [15:06:58] September 1, 2012: There is a Request for comment (RfC) on a Wikimedia Foundation Legal Fees Assistance Program to provide assistance to Wikimedia support roles that face legal action as a result of their work. This would include OTRS agents. Please comment at https://meta.wikimedia.org/wiki/Request_for_comment/Legal_Fees_Assistance_Program. [15:08:25] do I edit that in place or is it generated by something? [15:08:33] do we have version control for any of this stuff? [15:08:41] apergos: quilt [15:09:24] ok well I have no idea how that works [15:09:32] but if you or someone can talk me through it that would be fine [15:10:29] I think you put the quilt over your head, and then scream. Loudly. [15:10:36] Reedy++ [15:10:37] ok, doing that [15:10:50] pretty sure that won't get the change applied though [15:11:04] Reedy must go though a lot of quilts [15:11:29] I expect you can reuse em [15:12:14] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 1.02 ms [15:14:38] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 2 seconds [15:15:14] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [15:26:06] New review: Umherirrender; "Thanks. I hope, you are not the one person, who can do things like this." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21326 [15:26:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:27:16] * apergos looks at the scanset extension to see what would be needd for getting it into swift or if that even makes sense [15:27:32] heh [15:27:49] http://www.mediawiki.org/wiki/Extension:ScanSet [15:27:55] "this extension is undocumented." [15:28:22] at least it's less than 500 lines of code [15:28:30] yep [15:28:41] That and it was written by Tim [15:29:05] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , enwiki (14903) [15:29:50] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , enwiki (13885) [15:30:58] IIRC it's a pretty static dataset [15:31:51] well what I wonder is how anything gets saved into the scan directory [15:33:08] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2563* [15:33:40] That's what I mean by static [15:33:42] Nothing is added to it [15:33:51] I think Tim just dumped all the files into a folder [15:33:57] ic [15:34:38] well that's easy to fix (and yes that's sure what it looks like from the code *and* from the date on the directory) [15:35:40] apergos, sorry I was AFK, not sure how to use quilt.. [15:35:47] ok [15:36:04] well it probably should wait for someone who's dealt with otrs updates [15:36:41] paravoid, have you used git-buildpackage to build packages before? [15:36:57] i am super close, and I thought I had it, but now that I've cloned elsewhere and am trying to build for lucid I'm having troubles [15:38:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.907 seconds [15:38:45] also Reedy can you do a rename on mediawiki.org for me please? [15:39:03] If I have to ;) [15:39:04] [[User:SPage (WMF)]] -> [[User:S Page (WMF)]] (add a space between S and P) [15:39:23] you can trust me that S requested it or if you /really/ want I'll find you a diff :P [15:40:03] done [15:40:07] Jeff_Green, do you have a minute to adjust the MOTD on OTRS (or teach apergos how to do it?) [15:40:07] thanks [15:40:29] Back in a bit [15:40:40] Thehelpfulone: ohi, I was just responding to your email about the RTF issue [15:40:55] heh [15:40:57] I have no idea how to do it, but I have a few minutes to help look for it [15:41:18] [16:06:53] on the OTRS server (williams) in /opt/otrs/Kernel/Output/HTML/Standard/Motd.dtl. - can you change that text to replace it with: [15:41:18] [16:07:00] September 1, 2012: There is a Request for comment (RfC) on a Wikimedia Foundation Legal Fees Assistance Program to provide assistance to Wikimedia support roles that face legal action as a result of their work. This would include OTRS agents. Please comment at https://meta.wikimedia.org/wiki/Request_for_comment/Legal_Fees_Assistance_Program. [15:41:20] re. RTF, my vote is that we figure out how to disable RTF and do that for now [15:41:23] Jeff_Green: The OTRS guy has given instructions [15:41:40] Jeff_Green: https://bugzilla.wikimedia.org/show_bug.cgi?id=22622#c28 [15:41:44] "Disable rich text via sysconfig" [15:41:48] Or how to fix it ;) [15:42:07] Reedy: ya I've been looking for that toggle in the code/conf tree for about 10 minutes :-) [15:42:12] grep is failing me [15:42:17] find, also failing me [15:42:23] next trying "stab" [15:42:25] Ooh [15:42:26] That's crap [15:43:38] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2100 [15:43:50] motd is probably done via web interface . . . looking [15:44:32] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [15:45:08] ah btw Jeff_Green you are off the hook about the fundraising portal stuff on ms7, tomas tracked it down to some old code of trevor's and roan disabled it so it's all good. [15:45:19] Jeff_Green, I don't think it is, Tim said it's on the OTRS server (williams) in /opt/otrs/Kernel/Output/HTML/Standard/Motd.dtl [15:45:26] apergos: oh sweet. thanks [15:45:33] sure thing [15:45:35] Thehelpfulone: looking [15:45:41] thanks [15:45:44] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [15:47:41] PROBLEM - Host ms-be6 is DOWN: PING CRITICAL - Packet loss = 100% [15:53:36] Thehelpfulone: MOTD is updated [15:54:20] !log updated MOTD for OTRS. [15:54:28] Logged the message, Master [15:54:51] was there a ticket somewhere for the MOTD change? [15:56:36] Jeff_Green, great thanks, not quite, there's a mailing list thread on otrs-en-l [15:56:41] PROBLEM - Puppet freshness on cp1025 is CRITICAL: Puppet has not run in the last 10 hours [15:56:49] should I link you to it? [15:57:54] Jeff_Green, actually could you remove all the other stuff? it's old and we really want people to only read that [15:59:14] Thehelpfulone: sure [16:00:10] Thehelpfulone: done [16:00:19] thank you [16:00:22] np [16:01:32] hey mutante, would you have a sec to rename the wsor mailinglist? see rt ticket https://rt.wikimedia.org/Ticket/Display.html?id=3400 [16:01:45] oh, one more thing Jeff_Green, for the OTRS 3.1 server Reedy's outlined some stuff here, https://bugzilla.wikimedia.org/show_bug.cgi?id=22622#c29 is the plan to test it on labs (I see there's an OTRS project)? [16:02:33] drdee, what do you plan to rename it to, out of interest? [16:03:25] Thehelpfulone: still up in the air. there are ~2 major aspects of the upgrade: database modifications and frontend patches/testing. The former will probably happen on production hardware, and labs could be useful for the latter. [16:03:59] Thehelpfulone: from WSoR to wmfresearch [16:04:19] Thehelpfulone: most likely I will not be working on the OTRS project now that we're entering fundraising season. [16:04:37] yeah Reedy noted that, are there any other ops that would be able to do it? [16:04:44] drdee, is it still going to be a private list? [16:05:10] Thehelpfulone: yes, but I don't know who it'll be [16:05:50] is woosters going to the retreat today? [16:08:18] yes, I think so [16:08:27] yes [16:08:57] ok, so I guess we'll have to wait until he gets back to see who's likely to be available - for updating the MOTD could you put some instructions on wikitech so others (apergos for example) would know how to edit it? [16:09:44] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [16:09:44] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [16:10:19] 403 errors trying to access the dumps [16:13:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:13:32] eh? [16:14:00] tryiing to download more than two at once? [16:14:08] Vito_away: [16:14:37] apergos: mmmh lemme check if I killed wget on my sever [16:14:44] (which shares my ip) [16:14:47] cause worksforme [16:15:40] now it works [16:15:44] ok [16:15:46] but wget was already killed [16:15:52] drdee, WSOR was a 2011 project right? [16:15:59] btw there's not what I needed [16:16:03] :D [16:16:07] ha! [16:26:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.388 seconds [16:27:08] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , euwiki (15651) [16:28:21] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , euwiki (22943) [16:31:41] New patchset: Pyoungmeister; "Revert "Revert "Using log4j to log Lucene results to udp2log.""" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22733 [16:32:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22733 [16:33:06] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22733 [16:38:03] paravoid: I guess that Aaron might have been running a syncFileBackend.php which finally reached the Basilique_Saint-Pierre_et_Saint-Paul.jpg file and tossed it from ms7 some days after the swift deletion. [16:38:05] New patchset: Pyoungmeister; "Revert "Revert "Revert "Using log4j to log Lucene results to udp2log."""" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22734 [16:38:10] Thehelpfulone: yes, wsor was a 2011 project [16:38:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22734 [16:39:27] drdee, okay, so the wmfresearch list - is that going to be a private list? does the research team not already have a mailing list? I know the editor engagements ones do, as do HR... [16:40:26] Thehelpfulone, it is already a private list, and it's already being used, we jus want to have the name of the list reflect the activities [16:40:39] ah fair enough [16:44:21] Revert "Revert "Revert [16:48:16] Oh right the jQuery revert saga [16:49:02] No.. [16:49:16] I was meaning peters patchset above [16:49:22] "Revert "Revert "Revert "Using log4j to log Lucene results to udp2log."""" [16:50:14] ah ha it's a roan [16:51:09] * apergos wonders why ogghandler ships with a very recent copy of cortado but the one it points to on our servers *cough*on ms7 via nfs*cough* is so old  [16:53:32] Wait, Cortado is hosted on ms7 via NFS?! [16:53:56] no one reads this right? http://wikitech.wikimedia.org/view/Swift/Open_Issues_Aug_-_Sept_2012/Cruft_on_ms7 :-P :-P [16:54:08] even when I send a big frickin email about it :-P [16:54:15] Just delete it [16:54:16] :D [16:54:17] $wgCortadoJarFile = "$urlprotocol//upload.wikimedia.org/jars/cortado.jar"; [16:54:17] $wgCortadoJarFile = "$urlprotocol//upload.wikimedia.org/jars/cortado.jar"; [16:54:18] ZOMG [16:54:19] see that? [16:54:21] Yes, that's a regular HTTP-style URL [16:54:37] k whichever. http or nfs [16:54:43] it's still way fricking old [16:54:50] DELETE [16:54:50] DELETE [16:54:51] DELETE [16:54:51] and needs to be *somewhere else* [16:55:00] $urlprotocol was a behavior switch for when we migrated to protocol-relative URLs [16:55:23] gotcha [16:55:24] so... [16:55:41] what things scale oggs? the image scalers? [16:56:59] basically, what serves that js file? [16:57:47] Shouldn't bits probably be serving it? :/ [16:57:52] seeeems like [16:58:09] and cortado is I guess served up to the browser to run locally (?) [16:58:15] I suspect so.. [16:58:17] so it could live there too [16:58:23] I wonder if still using the ancient version there is creating any bugs we have open [16:58:34] yeah, I wonder the same thing but [16:58:45] I also wonder how much the behavior changes with the new version [16:59:05] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:59:06] I don't think we can sic AaronSchulz on this, it's not really swift :-P [16:59:31] I think I tested it locally when I upgraded the jar [16:59:39] (not such a major version jump though!) [16:59:45] oh you upgraded the jar, you touched it! [16:59:58] which means I can ask if you would pretty please look into making it live elsewhere...? [17:00:12] * apergos bats their eyelashes at Reedy and smiles sweetly [17:00:38] 0.5.1 to 0.6.0 [17:00:39] https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/OggHandler.git;a=commit;h=f4707ded123e317d6a3f4cf4d80c62c8a51b4c37 [17:00:52] right [17:01:06] OH [17:01:07] but we're running on some version from 2009 which I think means 0.2 something [17:01:13] the applet .jar file must come from the same host [17:01:13] * as the uploaded media files or Java security rules will [17:01:13] * prevent the applet from loading them. [17:01:19] <^demon|lunch> gitweb makes my eyes bleed. [17:01:23] gggaaahhh [17:01:24] I guess that's why it's on upload [17:01:35] * Reedy cries [17:01:40] sonofa.... [17:01:55] so it must live in swift [17:01:59] :-/ [17:02:04] *SIGH* [17:02:08] in some random container or other [17:02:12] Yeah.... [17:02:18] jesus [17:02:22] Can we put a newer jar with it and test it on test/test2? [17:02:47] if you give me a file or a link and a name I can put it in the dir for you to test [17:02:53] a name to save it as, I mean [17:03:00] I can do that right now [17:03:21] https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/OggHandler.git;a=blob;f=cortado-ovt-stripped-0.6.0.jar;h=6bd1fc91d05ac3042beffa069bd15f5379024be7;hb=f4707ded123e317d6a3f4cf4d80c62c8a51b4c37 [17:03:23] NOOO [17:03:31] noooooo? [17:03:36] stupid url [17:03:42] http://downloads.xiph.org/releases/cortado/cortado-ovt-stripped-0.6.0.jar [17:03:47] 19-Mar-2010 19:26 147K [17:04:50] in place with the name as downloaded [17:04:55] So... what domain is that .jar file going on? [17:05:14] if you add the appropriate config change in test/test2, you can pound on it all you like [17:05:25] Thanks. Will try it later [17:06:02] upload.wm.o, same as the current one... [17:06:21] Cool. thanks. [17:06:32] https://upload.wikimedia.org/jars/cortado-ovt-stripped-0.6.0.jar [17:07:21] AFK for dinner, back in a bit [17:09:20] bug triage now in #wikimedia-dev [17:12:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.350 seconds [17:20:05] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 265 seconds [17:22:47] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: (Return code of 255 is out of bounds) [17:23:05] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay seconds [17:26:23] PROBLEM - mysqld processes on storage3 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [17:26:23] PROBLEM - MySQL disk space on storage3 is CRITICAL: DISK CRITICAL - /a is not accessible: Success [17:28:06] robh: did you do anything on storage3? [17:28:20] apergos: are you around to help w/ ms-be6? [17:38:14] RECOVERY - Host ms-be6 is UP: PING OK - Packet loss = 0%, RTA = 0.39 ms [17:38:24] New patchset: Catrope; "(bug 20814) Enable CORS for the API" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22740 [17:42:55] New patchset: Catrope; "(bug 20814) Enable CORS for the API" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22740 [17:47:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:08] PROBLEM - Host ms-be6 is DOWN: PING CRITICAL - Packet loss = 100% [17:48:50] New patchset: Catrope; "(bug 20814) Enable CORS for the API" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22740 [17:50:54] Change merged: Catrope; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22740 [17:53:41] RECOVERY - Host ms-be6 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [17:54:17] RECOVERY - swift-object-replicator on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [17:54:17] RECOVERY - swift-container-auditor on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [17:54:17] RECOVERY - swift-container-replicator on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [17:54:17] RECOVERY - swift-account-server on ms-be6 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [17:54:26] RECOVERY - SSH on ms-be6 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:54:26] RECOVERY - swift-object-auditor on ms-be6 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [17:54:26] RECOVERY - swift-account-reaper on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [17:54:53] RECOVERY - swift-container-server on ms-be6 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [17:55:02] RECOVERY - swift-account-auditor on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [17:55:11] RECOVERY - swift-account-replicator on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [17:58:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.804 seconds [17:59:44] so sdf1, sdi1:, sdl1, sdm1, sn1 on ms-be6 were the ones that didn't mount before, on aug 31 [17:59:52] and they didn't mount this time either? what got replaced? [17:59:54] apergos: my thought is that the ssd's are the common factor in ms-be6 + [18:00:12] i replaced the controller card [18:00:13] I asked about that at some point (in the channel) but no one was around then that knew I guess [18:00:18] which hosts have em and which not, I mean [18:00:23] I see [18:00:26] man that sux [18:00:57] wondering if we removed the ssd's would they work correctly�but that would be a rebuild [18:01:26] if we're gonna do that (and I don't mind trying it with one box) let's check in with faidon and figure out which one to pull [18:01:30] and schedule it [18:01:42] at this point any of the ones out of the rings entirely is ok I guess [18:02:03] 6 would probably be the best to work on [18:02:14] it's out of the ring and has been down for awhile now [18:03:12] ah it's gone? great [18:05:42] well in the meantime I guess, updating dell and doing the next thing [18:05:51] motherboard? *sigh* [18:05:53] if they sent one [18:06:00] yes they sent one [18:06:45] i am not 100% sure it's of the ring�i was reading back on one of Faidon's emails and it is not entirely clear. so best to wait for him [18:08:40] cmjohnson1: nope, i logged in and ran megacli64 with no arguements [18:08:43] oh I checked the object ring and ms-be6 is gone from there [18:08:44] which just returns error code [18:09:01] megacli64 isn't going to give you anything worthwhile on these boxes [18:09:34] yeah it's not in the other two rings either [18:09:35] so apergos: do you wanna remove the ssd's and rebuild it�or should I ga and change the mobo [18:09:51] I think you should do the motherboard first so you can report to dell [18:09:58] if that doesn't fix it [18:10:11] do we have spare drives to drop in instead of the ssds? [18:10:12] RECOVERY - Puppet freshness on ms-be6 is OK: puppet ran at Wed Sep 5 18:10:00 UTC 2012 [18:10:50] the ssd's are in addition to the 12 disks�we would lose the 2 [18:11:20] idk how ben confg'd them [18:15:17] I'd rather keep the disk partitions etc as similar as possible, just swapping the ssds for regular drives [18:15:30] which we could steal out of one of the other boxes that's down [18:15:36] i don't have a place to put them [18:15:48] ? [18:16:03] PROBLEM - swift-account-replicator on ms-be6 is CRITICAL: Connection refused by host [18:16:03] PROBLEM - swift-object-server on ms-be6 is CRITICAL: Connection refused by host [18:16:03] PROBLEM - swift-container-server on ms-be6 is CRITICAL: Connection refused by host [18:16:21] PROBLEM - SSH on ms-be6 is CRITICAL: Connection refused [18:16:21] PROBLEM - swift-account-server on ms-be6 is CRITICAL: Connection refused by host [18:16:21] PROBLEM - swift-object-updater on ms-be6 is CRITICAL: Connection refused by host [18:16:21] PROBLEM - swift-container-updater on ms-be6 is CRITICAL: Connection refused by host [18:16:38] lemme look at where the ssd's are and see if a normal disk will work �.it's inside the server [18:16:44] ok. [18:16:48] PROBLEM - swift-object-auditor on ms-be6 is CRITICAL: Connection refused by host [18:16:52] but for right now I would say just try the motherboard next [18:17:06] PROBLEM - swift-object-replicator on ms-be6 is CRITICAL: Connection refused by host [18:17:06] PROBLEM - swift-container-replicator on ms-be6 is CRITICAL: Connection refused by host [18:17:06] PROBLEM - swift-account-reaper on ms-be6 is CRITICAL: Connection refused by host [18:17:06] PROBLEM - swift-container-auditor on ms-be6 is CRITICAL: Connection refused by host [18:17:10] and while you're in there you can see what the disk layout looks like too [18:17:33] PROBLEM - swift-account-auditor on ms-be6 is CRITICAL: Connection refused by host [18:17:33] PROBLEM - NTP on ms-be6 is CRITICAL: NTP CRITICAL: No response from NTP server [18:18:54] PROBLEM - Lucene on search1002 is CRITICAL: Connection refused [18:21:12] apergos: so a standard 2.5 disk will not fit�.fyi: ms-be1-5 only have the 12 disk. We've never added the ssd's. [18:21:27] PROBLEM - Host ms-be6 is DOWN: PING CRITICAL - Packet loss = 100% [18:22:09] ok [18:22:12] that's fine then [18:22:35] that might mean we repartition these things and redo all the rings again, a little annoying but whatever [18:23:58] a bit annoying but may be the answer�we'll see [18:27:43] of course, it could be the backplane. [18:27:49] (fi mainboard swap doesnt fix) [18:27:57] just isolating all disk related bus items. [18:28:21] backplane is usually the last item to possibly go bad (when brand new). it has the least amount of actual logic [18:28:31] they dies over time from folks reseating drives to hard or whatever, but meh [18:29:02] New review: Dzahn; "Yep, quote "Our main server rotation is chat.freenode.net."" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/22698 [18:32:42] RECOVERY - Lucene on search1002 is OK: TCP OK - 0.027 second response time on port 8123 [18:33:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:41:24] PROBLEM - Puppet freshness on mw74 is CRITICAL: Puppet has not run in the last 10 hours [18:45:38] New review: Dzahn; "http://packages.ubuntu.com/lucid/ttf-lyx" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/22705 [18:45:39] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22705 [18:46:01] mutante: :-) [18:46:57] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.024 seconds [18:47:29] hashar: heh, and after commenting on the other one i noticed it was abandoned already:) [18:47:53] yup [18:47:59] we talked about it this morning with Faidon [18:48:05] and he pointed me to the ttf-lyx package [18:48:19] much easier to maintain a package over some fontconfig hacks :-D [18:48:50] oh yes [18:48:57] +1 [18:49:21] PROBLEM - Lucene on search1002 is CRITICAL: Connection refused [19:01:21] PROBLEM - Puppet freshness on formey is CRITICAL: Puppet has not run in the last 10 hours [19:02:24] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [19:02:47] I've got a fix for those 2 ^ [19:02:58] https://gerrit.wikimedia.org/r/#/c/22697/ [19:06:15] New patchset: Dzahn; "remove Ben from Nagios and Icinga cgi.cfg (web UI permissions)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22749 [19:07:06] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22749 [19:08:33] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22749 [19:11:22] New patchset: Dzahn; "remove river and rich from Nagios and Icinga cgi.cfg" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22750 [19:12:14] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22750 [19:12:37] mutante: Could you take a look at https://gerrit.wikimedia.org/r/#/c/22697/ for me? It's a 2-line typofix and will fix puppet for manganese & formey. [19:14:01] Thanks mutante :) [19:15:23] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22697 [19:16:03] TomDaley: alright, running puppet on formey [19:18:51] Change merged: Krinkle; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16241 [19:19:12] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:20:23] ...and manganese [19:21:08] New patchset: Cmjohnson; "changing the dhcpd entry for ms-be6 to match the new motherboard." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22775 [19:21:55] mutante: can you merge my change please [19:22:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22775 [19:22:24] cmjohnson1: aha, so replaced the whole board,eh? nice [19:22:36] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22775 [19:22:38] notpeter: yt? [19:22:48] RECOVERY - Lucene on search1002 is OK: TCP OK - 0.027 second response time on port 8123 [19:24:42] New review: Ori.livneh; "When you have a moment, can you ping me on IRC to discuss this change? (i'm ori-l)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21054 [19:26:05] mutante: yep�hoping this works�thx for the merge [19:26:18] we hope it's nice [19:26:58] cmjohnson1: yea, we just talked about it here briefly. we'll have to reinstall [19:28:02] mutant and apergos: if you are going to reinstall�let's remove the ssd's [19:28:13] I don't want to do that yet [19:28:16] I want to do things one at a time [19:28:41] if it turns out to work with the new board and the ssds out we won't know which was the cause [19:29:07] good point [19:29:36] reinstall is not too timeconsuming so [19:30:32] TomDaley: Caching catalog for formey.wikimedia.org .. :) [19:30:48] out of curiosity, do you have any written plan for Precise upgrade ? [19:31:33] hashar: For gerrit boxen? [19:31:46] TomDaley: was more thinking about formey and gallium :-] [19:31:48] RECOVERY - Puppet freshness on formey is OK: puppet ran at Wed Sep 5 19:31:18 UTC 2012 [19:31:49] formey for the oxygen doc [19:31:59] though we could migrate doxygen to a precise box [19:32:04] formey can't upgrade til manganese does. [19:32:12] k [19:32:12] Yes, please move doxygen off formey [19:32:15] so I guess Gallium [19:32:29] and I could move the Doxygen doc generation to gallium [19:33:00] That'd be easy...just have jenkins generate the docs and we can have some doc.mediawiki.org vhost point to it or somesuch. [19:33:42] alright, formey and manganese are both done running puppet. [19:34:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.019 seconds [19:34:31] TomDaley: yeah that is the unwritten plan :-] [19:35:37] http://doc.mediawiki.org/mw/{core,extensions}/{1.19,1.20,master}/ [19:40:53] commons = dead [19:41:06] and it's back [19:41:31] We know, there some technical problems, and fixed now [19:41:33] :P [19:41:35] mutante: besides updating the dhcpd file�do I need to update anything else with the new mac? [19:42:38] THey're not fixed.. [19:42:54] Sidebar links on otrs-wiki are randomly going to random places, the skin isn't loading right on most loads [19:43:04] whoever in ops ^ [19:43:05] :) [19:43:32] like one goes to Wikispecies:Help........or Category:Top Level....random things that never existed. [19:43:51] !log deleting Ben as a Watchmouse contact [19:43:55] oh [19:44:00] Logged the message, Master [19:44:07] It's the Mediawiki.org sidebar but it's on the otrswiki [19:45:08] cmjohnson1: i can't think of anything else right now..guess not [19:45:25] mutante: thanks for the otrs query :) [19:45:52] ok..well i am unable to ssh into it�.and i can't ping it [19:46:38] on mgmt what do you see, anything useful? [19:47:09] no ping is not much good [19:48:09] cmjohnson1: if you have not reinstalled it wont work [19:48:16] when the mainboard swapped, the nic data is invalid [19:48:18] you have not tried to pxe boot yet? [19:48:44] robh: i updated the dhcpd file w/ new mac [19:49:07] apergos..no i did not try and force a pxe boot yet [19:49:39] well you will need to do that [19:49:52] no point in trying to boot with the old os, it's not going to come up [19:50:04] (wrong mac now) [19:51:49] ok, the puppet cert for ms-be6 is cleaned up so puppet should not have any issues now [19:52:23] ok [19:52:38] going to try and boot from network [19:52:55] RECOVERY - Host ms-be6 is UP: PING WARNING - Packet loss = 73%, RTA = 0.25 ms [19:53:10] ^ good news [19:53:20] this is pxe boot? [19:53:43] yes [19:53:49] ok [19:55:59] !log added new Nagios monitor to Watchmouse [19:56:09] Logged the message, Master [19:58:29] http://wikitech.wikimedia.org/edit/Talk:Swift/Open_Issues_Aug_-_Sept_2012/Cruft_on_ms7?redlink=1 betting pool: when will all stuff be off of ms7 [19:58:46] RECOVERY - Puppet freshness on manganese is OK: puppet ran at Wed Sep 5 19:58:33 UTC 2012 [19:59:47] apergos: what do you want to do next (ms-be6) [20:00:03] is the OS on it now? [20:00:06] cause that's next [20:00:18] reinstall to the point where you have to do puppet stuff [20:00:26] and then I'll do the next steps for that [20:00:32] k [20:06:04] I saw a Brion VIBBER picture float by in the wlm app, clicked on it, and commons reported [22da669e] 2012-09-05 19:59:54: Fatal exception of type TimestampException [20:06:16] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:06:25] RECOVERY - SSH on ms-be6 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [20:08:02] I get it on all his fine uploads. Maybe the time in the filename confuses commons? [20:09:04] http://commons.wikimedia.org/w/index.php?search=Chicago+Avenue+Water+Tower , note names like "File:Chicago Avenue Water Tower and Pumping Station (taken on 27Aug2012 17hrs36mins12secs).jpg" [20:10:28] wow that's pretty awesome [20:11:21] New patchset: Aaron Schulz; "Removed swift switch variable." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22823 [20:11:25] !log rebooting storage3 [20:11:34] Logged the message, Master [20:12:06] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22823 [20:12:38] spagewmf: 2012-09-05 20:12:26 mw52 commonswiki: [54aab87b] /wiki/File:Chicago_Avenue_Water_Tower_and_Pumping_Station_(taken_on_27Aug2012_17hrs34mins16secs).jpg Exception from line 130 of /usr/local/apache/common-local/php-1.20wmf11/includes/Timestamp.php: MWTimestamp::setTimestamp : Invalid timestamp - 1971:01:01 0:00:00 [20:12:40] New patchset: Demon; "(bug 37083) Setup periodic repack of git repositories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22824 [20:13:01] spagewmf: It doesn't like what's being read from the metadata [20:13:27] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/22824 [20:13:28] (Reedy, Brion in #wm-tech indicates he has a fix in gerrit for exif-time-bug). [20:13:52] woah nice [20:13:58] let me merge [20:14:04] PROBLEM - Host storage3 is DOWN: PING CRITICAL - Packet loss = 100% [20:14:13] PROBLEM - Puppet freshness on cp1022 is CRITICAL: Puppet has not run in the last 10 hours [20:14:14] New patchset: Demon; "(bug 37083) Setup periodic repack of git repositories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22824 [20:14:15] Happening quite regularily [20:15:02] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/22824 [20:18:56] New patchset: Demon; "(bug 37083) Setup periodic repack of git repositories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22824 [20:19:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22824 [20:19:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds [20:20:12] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22824 [20:21:20] what is mid-xxx.jogv.jpg anyways? [20:21:47] apergos: Thumbnail of the middle frame of a video I think [20:21:59] Or the n-th frame or whatever [20:22:08] huh [20:26:59] so on test2 it uses the name of the pixel in the thumb name and on the regular wikis it uses mid- plus the filename [20:27:13] * apergos goes to look at the extension [20:29:49] New patchset: Demon; "Fixup for Ie79f4dee: Command should be in single quotes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22827 [20:30:24] * apergos looks for OggHandler on test2... in Special:Version [20:30:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22827 [20:30:54] hello [20:31:04] i have a qz [20:31:07] and fails to find it [20:31:11] Reedy: ^^ [20:31:23] question regarding wikimedia.org email [20:32:07] Ah [20:32:23] apergos: apparently it's only enabled where TimedMediaHandler isn't.. [20:32:27] :-D [20:34:05] Wow.. [20:34:07] And that has [20:34:07] window.cortadoDomainLocations = { [20:34:07] 'upload.wikimedia.org' : 'http://upload.wikimedia.org/jars/cortado.jar' [20:34:07] }; [20:34:50] bbaaahhh [20:36:35] who ships an extension with that crap hardcoded in there?? [20:36:39] we do :-/ [20:37:07] Ok, now here... JS on commons seems broken [20:37:10] logged in and out [20:37:14] still WFM [20:37:29] Though, it is listing it against that upload.wm.o as a host makes some sense, rather than for eveywhere [20:37:47] Reedy: Have you cleared your cache and logged out? [20:38:38] Its still WFM in incognito mode [20:39:03] damn [20:40:10] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [20:40:10] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [20:40:10] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [20:40:10] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [20:40:10] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [20:40:11] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [20:40:11] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [20:40:12] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [20:40:12] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [20:40:16] asking in #wikimedia-commons then [20:40:18] apergos: TMH could explain why thumb links are brokened on test/test2 though :D [20:40:30] I am using commonsfrom chrome with cache cleared before going there [20:40:32] looks fine to me [20:40:38] ( hoo ) [20:40:44] Reedy: yeah I was thinking that :-D [20:40:52] mhm, for me it's broken in FF (logged in) and Midori (logged out) [20:40:56] both say mw is undefined [20:41:02] in console? [20:41:06] oh and logged out. did not try to log in [20:41:12] Reedy: Yes [20:41:59] apergos: 0.6.0 seems fine on commons now... Want to clear out the other old ones, leave the 0.6.0 for the time being, and copy it over cortado.jar too? [20:42:11] I note most are root:root.. [20:42:16] Now commons is fully broke [20:42:17] Error: ERR_INVALID_REQ, errno [No Error] at Wed, 05 Sep 2012 20:41:49 GMT [20:42:19] I don't want to clear out the old ones, I don't actually get any space back that way [20:42:22] so might as well leave em [20:42:38] Request: GET http://commons.wikimedia.org/wiki/Main_Page, from 89.13.194.1 via amssq33.esams.wikimedia.org (squid/2.7.STABLE9) to () [20:42:39] Error: ERR_INVALID_REQ, errno [No Error] at Wed, 05 Sep 2012 20:42:33 GMT [20:42:48] but I am happy to move the cortado.jar outa the way and cp -a the 0.6.0 into place [20:43:03] Oh duh [20:43:05] as soon as we determine commons is not broke [20:43:06] A whole 2.8M!!!! [20:43:32] Works again... meh [20:43:35] great [20:43:40] moving... [20:43:43] yay :) [20:44:43] apergos: I'm only getting it on action=view [20:44:46] edit etc. works fine [20:45:06] done [20:45:21] solaris has no cp -a, stoopid thing [20:45:40] so for about 3 seconds there anyone trying to use those extensions would have gotten fail [20:50:17] * hoo is now going to start a Firefox over ssh with X-Forward :D [20:52:07] so next will be moving that intoswift I think and doing any rewrite where it's needed [20:53:22] PROBLEM - Lucene on search1002 is CRITICAL: Connection refused [20:53:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:53:39] for some reason the remote firefox takes over the local one... probably due to matching user names? :/ [20:54:16] New patchset: Ottomata; "HttpHandler.java - avoiding NullPointerExceptions when search request is null." [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/22829 [20:54:37] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22827 [20:57:07] PROBLEM - Host ms-be6 is DOWN: PING CRITICAL - Packet loss = 100% [20:58:04] firefox --sync -no-remote seems to do the trick [20:58:48] (10:58:01 PM) DeannaT2: hoo, when the watch-unwatch star is something with javascript yes [20:58:48] (10:58:36 PM) darkweasel: hoo: i also have those problems [20:59:17] I think that is broken from jquery changes [20:59:22] PROBLEM - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is CRITICAL: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2663* [20:59:27] Krinkle told me off about it [20:59:43] ? [20:59:50] Reedy: Seems more like a load order glitch [20:59:52] watch/unwatch star [21:00:07] Reedy: It's all broke [21:00:08] yes, that's because it uses notification which among other modules depends on jq 1.8 [21:00:16] mw is used before defined [21:00:42] We need to go back to jq 1.8 and deal with the regressions [21:00:54] (in that order imho) [21:01:17] Krinkle: Do you get errors on commons yourself? [21:01:50] Firefox though ssh is unusable slow :( [21:02:04] Yes, I get it on every wiki all the time [21:02:26] Uncaught TypeErrors from mediawiki modules and user gadgets all using jq 1.8 [21:02:34] 1.8.1 had more issues than 1.8, right? [21:02:42] I get "TypeError: 'undefined' is not an object (evaluating 'mw.centralNotice.initialize')" on common main page [21:02:50] How is it that everything suddenly depends on 1.8? We've had 1.8 in core for like a few weeks now? [21:02:51] RECOVERY - Host ms-be6 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [21:02:59] God knows [21:03:09] Reedy: RoanKattouw People adapt fast [21:03:17] Siebrand complained about Narayam slowness in 1.8.1, not sure about 1.8.0 [21:03:25] Not that fast [21:03:31] Both 1.8 and 1.8.1 are slow for narayam [21:03:31] Wikipedia hates change [21:03:33] But I'm always more of a fan of reverting than fixing [21:03:36] * Reedy looks for pitch forks# [21:03:38] and there's an upstream bug in jquery about that [21:03:40] cmjohnson1: saw the mail, does this mean you are to the puppet cert piece? [21:03:51] Revert the changes that make things 1.8-dependent and wait for jQuery to get its shit together [21:03:59] but I cant' say I care enough about any module or extension to justify uncaught exceptions on every page in core modules [21:04:01] Unfortunately when there's no JS people around, the easiest option is reverting [21:04:10] just finalizing now [21:04:28] heya, anyone listening able to answer gerrit/github questions? [21:04:36] We can hardly revert all those changes, can we? [21:04:47] MediaWiki 1.5! [21:04:48] not sure what the best decision would be for my stuff, need some input [21:05:00] Yes, the JS errors are a problem and should be fixed. But I think it's much better to do that by making stuff not depend on 1.8 rather than going back to a broken jQuery version [21:05:05] Reedy: 1.20wmf9 should probably do :D [21:05:22] ok, I'll wait :-) [21:06:03] apergos: all yours [21:06:11] great [21:06:25] PROBLEM - SSH on ms-be6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:06:29] ottomata: depends what your question is.. We're thin on the ground it seems.. [21:07:00] welp, i'm updating udp-filter with some requested features (CIDR ip address filter,ing + IP address anonynization) [21:07:01] RECOVERY - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is OK: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y OK - 2375 [21:07:06] i'm using two 3rd party libs to do each of these [21:07:10] libcidr and libanon [21:07:18] neither of these libs had debian packaging [21:07:22] so I added that [21:07:26] Krinkle: Do you have an idea how many commits depend on jQuery 1.8? Can we just revert those all? [21:07:29] I also had to make a couple of modifications to the source [21:07:46] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.030 seconds [21:07:47] i'm torn [21:07:54] No, because that would take hours and would be even worse. Because some of those commits fix things that have changed in PHP. [21:08:04] i would prefer to host these on github, since they are authored by other people outside of the org [21:08:08] and then others have introduced a new JS API end point that is in use in some extensions now [21:08:09] etc. [21:08:11] and i've been in touch with the authros about my work [21:08:18] It would probably be better to just fix the 1.8 dependencies, right? [21:08:21] Krinkle: Thought so... so we have to get back to 1.8 [21:08:23] but, I will be building debs of these that will be installed on production systems [21:08:29] No, I don't want to go back to 1.8 [21:08:45] and I know that ops won't trust github for that [21:08:47] RECOVERY - SSH on ms-be6 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [21:08:55] Why not? We will have to update eventually, obviously. [21:08:57] currently I have libcidr on github and libanon in gerrit :p [21:09:03] And right now there is major features broken [21:09:09] Finding all the dependencies sounds hard, especially cause we don't have real error message [21:09:14] uncaught exceptions, causing other modules to not excecute etc. [21:09:16] ottomata, why don't you setup a mirror on github for each library, and keep our stuff in gerrit? [21:09:27] can I do that without admin access? [21:09:39] There is 2 bug reports with jq 1.8 [21:09:50] $ git log origin/wmf/1.20wmf9..origin/wmf/1.20wmf11 --pretty=oneline resources | grep -v Merge | grep -v Revert | wc -l [21:09:52] 21 [21:09:54] Not /too/ bad [21:09:58] one about slow keyboard events in Narayam. That is an upstream issue that may be fixable by optimizations in Narayam [21:10:06] Yes, but it's still broken [21:10:12] We can fix the reference errors relatively easily [21:10:21] ottomata: maybe ^demon can help with setting that up [21:10:25] We don't even know if it's *possible* for us to fix the Narayam problem [21:10:44] the other one is about a slow selector causing Firefox to hang (in an opt-in featuer, right click edit) - this one was fixed in https://gerrit.wikimedia.org/r/#/c/22618/ and speeds it up 100x faster in either 1.7 or 1.8 [21:10:49] What about hacking jQuery up for now? [21:10:52] Another way of looking at it is this: jQuery 1.8.0 and 1.8.1 are themselves broken. 1.7.2 isn't broken, it's just that our code doesn't like 1.7.2 right now [21:11:43] And then another is: We have to move to 1.8 anyway and there probably was a reason to use new functionality [21:11:57] We have to move to 1.8 eventually, we can do so when it's actually stable [21:12:10] The new functionality is stuff that can probably be done using 1.7 stuff as well [21:12:39] 1.8.x isn't broken. There is two performance regressions. The one in keyboard events is not really noticeable (only when typing very fast) and even then it catches up eventually, no hanging. The other one is because of a slowdown in the has() selector, however that was because of incorrect usage of that selector on our part, and that has been fixed. [21:13:03] .. on our part. [21:13:19] Are you saying the Narayam slowness is completely fixed now? [21:13:20] so that leaves one slowness bug that will likely be fixed soon in either jquery or Narayam. [21:13:39] What is the remaining slowness bug? You said there's a ticket against jQuery for it? [21:14:02] RECOVERY - swift-container-server on ms-be6 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [21:14:02] RECOVERY - swift-object-server on ms-be6 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [21:14:20] RECOVERY - swift-container-updater on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [21:14:20] RECOVERY - swift-object-updater on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [21:14:38] RECOVERY - swift-account-auditor on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [21:14:38] RECOVERY - swift-object-auditor on ms-be6 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [21:14:38] RECOVERY - swift-container-auditor on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [21:15:05] RECOVERY - swift-account-reaper on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [21:15:05] RECOVERY - swift-account-server on ms-be6 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [21:15:14] RECOVERY - swift-container-replicator on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [21:15:25] Krinkle: I'm not that into jQuery sources... is it possible to just use the code from 1.7.2 for these functions and overwrite the current ones... or is that going to break badly? :P [21:15:27] okay, again. There is 2 bug reports in our bug tracker as a result of 1.8.1. Slow keyboard handling in Narayam. And Firefox "slow script warnings" for rightClickEdit feature (which is opt-in). The latter was fixed on our part in https://gerrit.wikimedia.org/r/#/c/22618/. [21:15:27] The Narayam bug is not resolved yet, but it is not a breakage, just a slow down. Not something I think is worth the trouble of reverting lots of stuff for. [21:15:41] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 196 seconds [21:16:08] RECOVERY - swift-object-replicator on ms-be6 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [21:16:22] so maotherboard produces same errors. I'll add to the ticket when I get back [21:16:23] Where is the jQ ticket for the Narayam slowness? [21:17:12] sdm1 sdl1 sdi1 sdn1 fail to mount, same old same old [21:17:52] RoanKattouw: don't know, let me check [21:18:45] WTF, $.parseHTML isn't even documented on the jquery web site [21:19:24] Not yet, it is a public API though [21:20:08] 1.8.1 issues tracking: https://bugzilla.wikimedia.org/show_bug.cgi?id=39972 [21:21:25] Re comment 2 and 3: yes, we should figure out what's broken/slow and why and fix it, but in the meantime we can't have broken and/or slow code on the live site [21:21:34] Stuff either works well or it gets reverted until it does [21:22:13] RoanKattouw: Well, that's probably to much work... can't you just push 1.8.1 again? Atm commons is broken badly [21:22:33] I can clean up $.parseHTML usages fairly easily, there's only 2 [21:22:33] We can, but then it makes i18n stuff unuseable [21:22:36] hoo: What's broken on Commons? [21:22:49] RoanKattouw: Site JS is fully broken atm [21:22:58] it fails due to calls to net defined mw [21:23:06] What's the error msg? [21:23:17] TypeError: mw.util is undefined [21:23:17] http://bits.wikimedia.org/commons.wikimedia.org/load.php?debug=false&lang=en&modules=site&only=scripts&skin=vector&* [21:23:17] Line 1 [21:23:18] RoanKattouw: there's more than just parseHTML [21:23:20] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 26 seconds [21:23:24] RoanKattouw: bug report is this: http://bugs.jquery.com/ticket/12436 [21:23:25] TypeError: mw.centralNotice is undefined [21:23:25] http://bits.wikimedia.org/commons.wikimedia.org/load.php?debug=false&lang=en&modules=site&only=scripts&skin=vector&* [21:23:25] Line 1 [21:23:29] hoo: What do you have to do to trigger that? I don't see it [21:23:44] RoanKattouw: Any page, except ?action=edit on commons triggers that [21:23:46] And if mw.util is undefined something's really terribly wrong, that's not just a jQuery incompatibility I don't think [21:23:54] logged out and clear your cache maybe [21:23:58] RoanKattouw: It turns out the Narayam issue is actually the same, it uses live() with a non-CSS selector (':text') [21:24:39] hoo: Not happening for me, logged out, went to Commons:Village_pump, cleared cache [21:25:06] Strange... but we got many reports [21:25:19] what about getting commons back to wmf10? Might work [21:25:54] Great, so jQuery 1.8.0 has broken selector behavior and 1.8.1 has slow selector behavior [21:26:22] New patchset: Jgreen; "adding netapp /vol/fr_archive mount for locke" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22830 [21:26:29] Krinkle: Is there a way Narayam could work around the slowness? I suppose that could work too [21:26:40] RoanKattouw: Yes, I'm fixing it right now [21:26:43] Awesome [21:26:47] RoanKattouw: You should see it, it is in sane [21:26:48] 5 $.narayam.addInputs( 'input:text, input[type=search], textarea, div[contenteditable=true]' ); [21:27:03] I wrote at least part of that [21:27:04] : https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/Narayam.git;a=blob;f=resources/ext.narayam.core/ext.narayam.core.js;h=d870dcf41c9b426d9cd06e5746873b1119dc1a98;hb=HEAD#l351 [21:27:07] that worked? :P [21:27:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22830 [21:27:09] 4 live ones [21:27:17] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22830 [21:27:25] Although I don't think I wrote the pseudoselectors, and I was young and inexperienced when I wrote that code [21:28:22] "We were young and wild and free" [21:28:29] (Bryan Adams - Heaven) [21:28:35] THanks [21:28:47] I was singing the rest of the song in my head to get to the chorus and figure out what it was [21:28:55] hehe, me too [21:29:22] "There was only you and me" [21:30:23] RECOVERY - NTP on ms-be6 is OK: NTP OK: Offset 0.0007096529007 secs [21:35:02] New review: Ottomata; "Hey Jeff," [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22830 [21:39:43] It's impressive, I actually know all the lyrics from there to the chorus, I don't usually remember lyrics that well [21:40:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:44:56] New review: Jgreen; "sigh.reverting." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22830 [21:45:30] New patchset: Jgreen; "Revert "adding netapp /vol/fr_archive mount for locke"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22834 [21:46:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22834 [21:52:08] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22834 [21:53:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.571 seconds [22:09:06] !log pulling in change 22690 to labsconsole [22:09:16] Logged the message, Master [22:11:37] TomDaley: if a gerrit change has a dependency and then you approve one of them and it tries to merge but can't due to the dependency not being approved yet, it seems to keep retrying that every 15 or 30 minutes, creating more and more review messages [22:12:10] ugh, bad timing [22:12:31] !log shutdown ms-be6, prep for puppet changes for reinstall w/o ssds [22:12:40] Logged the message, Master [22:13:35] PROBLEM - Host ms-be6 is DOWN: PING CRITICAL - Packet loss = 100% [22:16:24] New patchset: RobH; "changing ms-be6 to NOT use SSDs for testing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22836 [22:17:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22836 [22:18:56] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22836 [22:27:05] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:31:04] !log blog software is outdated by a few versions, updating. [22:31:13] Logged the message, RobH [22:34:17] !log blog update successful, all plugins also updated. [22:34:27] Logged the message, RobH [22:35:56] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [22:36:15] !log renaming mailing list "wsor" to "wmfresearch" [22:36:24] Logged the message, Master [22:39:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.307 seconds [22:40:57] apergos: hi [22:41:08] hello [22:41:31] so ms-be6 has gone through new controller, then new motherboard, still has the same old errors [22:41:36] sigh [22:41:46] next up is to yank thw two ssds and see after a reinstall what that looks like [22:41:54] chris will do that for us [22:42:00] New patchset: Dzahn; "add exim mail alias and lighttpd url redirect for renamed wsor list" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22842 [22:42:02] but he's out for a few hours [22:42:33] in the meantime I've been working on moving the cruft list forward, feel free to take a look at it today [22:42:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22842 [22:42:45] another possibility is that the controller is so broken that gets confused by a single disk fail [22:42:46] New review: Dzahn; "for RT-3400" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/22842 [22:42:47] that's pretty much the short swift update [22:42:47] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22842 [22:45:08] grumble grumble at Dell [22:45:10] there are plnty of unpleasant possibilities left [22:45:16] just gotta knock em out one at a time [22:45:49] mutante, can you do the affcom one too whilst you're at it? Philippe should have okayed it [22:46:29] the box has a sas/sata expander [22:47:59] mutante, there's also https://bugzilla.wikimedia.org/show_bug.cgi?id=38291 for Wikimediaru-l --> Wikimedia-RU-Internal -- I'm not sure if there's an RT ticket for that [22:57:02] RECOVERY - Host ms-be6 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [23:00:11] PROBLEM - swift-object-replicator on ms-be6 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:00:20] PROBLEM - swift-account-reaper on ms-be6 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:00:20] PROBLEM - swift-container-replicator on ms-be6 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:00:47] PROBLEM - SSH on ms-be6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:00:47] PROBLEM - swift-object-server on ms-be6 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:00:56] PROBLEM - swift-object-updater on ms-be6 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:00:56] PROBLEM - swift-container-server on ms-be6 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:01:05] PROBLEM - swift-container-auditor on ms-be6 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:01:23] PROBLEM - swift-container-updater on ms-be6 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:01:32] PROBLEM - swift-account-auditor on ms-be6 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:01:41] PROBLEM - swift-account-server on ms-be6 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:01:41] PROBLEM - swift-object-auditor on ms-be6 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:04:11] Thehelpfulone: there is no RT for that one yet. Could you create one please [23:05:17] Thehelpfulone: also, fyi, if you want Philippe or someone else to add an approval, they can just mail to the ticket number, they dont really need access via web ui [23:06:12] !log stopping puppet on brewster for local hack testing on ms-be6 [23:06:21] Logged the message, RobH [23:10:49] mutante, sure I'll do it for the RU list - and in what cases do you need an approval (and by who?) [23:12:11] Thehelpfulone: always send stuff through Philippe [23:13:24] Thehelpfulone: I'll do the affcom one next [23:14:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:15:21] mutante, thanks, I've just sent the wikimediaru one across [23:15:38] I can see why you'd want Philippe confirmation for mailing list archive stuff - but for renames too? [23:16:05] PROBLEM - Host ms-be6 is DOWN: PING CRITICAL - Packet loss = 100% [23:20:08] Thehelpfulone: your new ticket number for reference is 3518 [23:20:57] thanks, I wish RT would send me an automated response with that! [23:20:59] Thehelpfulone: yes, for any changes to lists please talk first to a) existing list admins and b) Philippe (consider him mailman "owner" / point of contact), then create tickets [23:21:36] Thehelpfulone: i agree, you should absolutely get that confirmation mail with the number [23:21:56] !log done with local hacks on brewster, resuming puppet [23:22:05] Logged the message, RobH [23:22:22] mutante, okay, but I'll continue to create new lists myself - I presume you [ops] don't need to know that? [23:22:30] renames of mailing lists mean list server downtime. [23:22:36] Thehelpfulone: also they are perfect if they link to some page with community consensus about it or similar [23:22:57] Thehelpfulone: what lists you create without ops involvement are dependent on what policies they told you to follow when they gave you the list creation password =] [23:24:56] RobH, sure, just confirming :) [23:25:19] Thehelpfulone: if in doubt, ask Philippe [23:25:21] mutante, re the confirmation email - who do I talk to about getting that sorted out? CT? I know RT is not a "high priority" for ops at the moment, but that can't be too difficult to turn on [23:25:28] that's what I usually do ;) [23:25:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.586 seconds [23:26:21] Thehelpfulone: please create another ticket [23:27:31] Thehelpfulone: an another issue, the one i just renamed (wsor) was not in list overview. So you may not have it in your table. Maybe you feel like asking the admins if this needs to be hidden? [23:28:08] New patchset: Catrope; "Set $wgDisplayTitle = false; on bewikimedia per request on IRC" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22847 [23:28:14] mutante, it's a private wmf-only research discussion list, so I think it'd be okay to remain hidden (is that right drdee?) [23:28:36] Thehelpfulone: sometime i thought maybe there should be a meta-list , a "list about list changes" that would announce all that stuff (new lists, renamings, admin changes,etc) [23:28:39] Change merged: Catrope; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22847 [23:29:13] Thehelpfulone: i don't have a strong opinion but i just want to point out our policy says to not hide lists unless really necessary [23:29:23] mutante, I created the ticket for the send me an auto-response, but when you rename a list, can you hide the old list from the list directory? I think that would be better than someone then clicking on it and being redirected to a new list [23:30:12] yeah, but from what I've heard we've already got quite a few hidden lists - I don't know what all of those lists are so I wouldn't be able to do a review, but I think someone told me that the ones that are already hidden are probably hidden for good reasons [23:30:33] Thehelpfulone: there are quite a few that are not [23:30:50] and it causes these issues like you can't even have them in your overview table [23:30:58] mutante, re your meta-list - perhaps mailman@? I was asking mark about it as it's referenced in the new list creation email - apparently it's not been used since 2006 [23:31:34] not hidden for good reasons you mean? [23:32:18] New patchset: Catrope; "(bug 29137) Resurrect liquidthreads_labswikimedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22848 [23:33:05] yea, there is a difference between "probably nobody is interested in this" vs. "not even the existence of this list can be confirmed" [23:33:35] Thehelpfulone: i think it's due to people just using the term "private" but meaning different things [23:33:48] Change merged: Catrope; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22848 [23:34:01] showing that a list exists does not mean you also have to allow everybody to subscribe to it [23:34:07] and our policy talks about just that [23:34:21] I think people think that if a list exists more random people will start wanting to subscribe to it [23:35:04] and in some cases, I think some mailing lists allow posts by non-members - I'm thinking of a particular WMF one that could get spammed quite easily [23:35:16] that might be true. but people should discuss the policy then and change it [23:35:21] "The lists directory shall be complete: everyone should be able to know that a list exists, even if without description or maybe without being able to open its info page" [23:37:42] i'd rather change the rules than saying "but others have done it before".. it's a cycle [23:40:49] mutante, did you see philippe's strong opposition just below that? [23:47:07] Thehelpfulone: yes, and i keep pointing out that discussion should be continued and list admins should add to it [23:47:29] Thehelpfulone: i agree though to Cary's "List are often left as "non-published" simply to prevent the email spam harvesting" [23:48:30] Thehelpfulone: and btw, list admins are usually not aware of the other anti-spam options they have and that spamd alrady scores the mails for them, they just need to filter by these headers [23:51:22] mutante, agreed, maybe we can just make it part of the default settings for new lists? [23:51:29] Thehelpfulone: what do you usually do when asked to create a list? Like from my experience you will hardly ever get a full list of the privacy options the list admin prefers. Do you ask them about every single setting or what do you default to? [23:52:15] I usually leave most of the default settings for public lists - other than adding a terse phrase and description for the list (I think archiving is on by default) [23:52:45] for the private lists it's the stuff above + making the archives private, choosing confirm and approve for subscription options and hiding from the list directory [23:52:55] Thehelpfulone: but if they just say "private list", it can mean a lot of different things (advertise list, allow subscriptions, have archives public/private .. and so on .. [23:53:33] for private I usually dig in a bit more - like I did with the wikimediaru-l list on https://bugzilla.wikimedia.org/show_bug.cgi?id=38291 [23:53:45] i kept sending looong mails pointing out all the options and then leaving it to list admins to change them, of course they can anyways.. but ideally i would imagine something like an HTML/wiki form people are supposed to fill out [23:54:30] but with regards to spam score, what are your thoughts on adding a spam filter to default mailman settings? one of the ones that is used on one of the lists I administrate is X-Spam-Score: \d{1,2}\.\d \(\+{4,}\) to discard those messages [23:55:00] maybe that needs tweaking a bit (I'm not even sure it's working though) [23:55:41] i would like to make sure list admins are aware of the settings and decide for themselves. you _can_ add a default for their convenience, but on the other hand i feel that if you add a default it is very likely that it is never looked at again after creation. (and then later you might get tickets asking about it or to change it) [23:57:20] i am not sure there is one setting that is just good for all lists [23:57:42] sure - I think list creation as a whole has slowed down, so it might be more a case of educating the old list admins? [23:59:25] yes, indeed. it seems people are annoyed by spam but don't realize the service to score the mails is even there