[01:56:41] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Announcing_new_Module:_Charts [01:59:17] They managed to do that even without image extensions [01:59:29] I wonder when will we have 3D rendering using CSS [02:02:59] vvv: I'm wondering when someone will rewrite EasyTimeline in Lua modules. [02:04:11] https://en.wikipedia.org/wiki/Module_talk:Chart [02:04:12] Hmm. [06:12:30] hi! is there a tool to search user contributions for a word? thanks. [06:12:44] i don't think so [06:12:59] you could make one. or maybe there's one i just don't know about [06:13:32] * jeremyb_ was complaining earlier that you were editing (a bot in #wikimedia-stewards mentioned you) but not on IRC at the time [06:13:48] me? [06:13:52] yeah [06:14:05] stab me there properly then :) [06:14:13] that was after i accidentally stumbled on https://bugzilla.mozilla.org/show_bug.cgi?id=559785 [06:14:36] ah yeah, a few years old thing [06:22:07] Gryllida: edit summary or text? [06:22:15] text [06:22:21] not that i know of [06:22:25] edit summary could be good too :) [06:23:56] https://toolserver.org/~snottywong/commentsearch.html [06:24:14] ta [09:45:24] Gerrit UI has been unreachable for a few minutes for me. [09:45:40] Was confirmed by a user from India. [09:46:22] yep; seems to be down here in SF too [09:54:29] Gerrit up again [13:47:48] There is much more SPAM in the OTRS than normal; can somebody check if the spam-filter is still working? [14:01:55] DaBPunkt: the message headers should contain some minimal clue about it [14:02:16] Nemo_bis: good point [14:18:05] a new mail in the junk-folder contained a SPAM-line so I guess the filter is running. Sorry for the trouble [14:24:39] DaBPunkt: what ticket #s? [15:19:45] jeremyb_: 2013040110007496 for example. We find it strange that the from-adress is AWL [15:20:22] AWL? [15:20:44] > Ticket moved into Queue "Junk" (3) from Queue "permissions" (27). [15:20:48] auto-white-listed [15:21:31] idk much about how AWL works [15:21:38] any other particular examples? [15:21:46] or should i just look at the Junk queue? [15:23:53] Jeff_Green: do we have any better idea how mchenry/williams interact with eachother now? [15:24:20] in particular i'm wondering how some messages get spam report headers and not others [15:24:21] yep. afaik they're back to interacting the way they were intended to, i.e [15:24:25] ah [15:24:38] ah, well that can actually get a little complicated [15:25:10] we may not be tagging all messages--iirc we use spamassassin and there's a configureable threshold below which it doesn't tag [15:25:27] we may also be exempting some mail from checks [15:25:39] i can look into specific cases if you want [15:25:51] 2013031310000445 [15:26:05] can you bounce me a message with full headers intact? [15:27:22] i think it adds a few headers... [15:27:25] lets see [15:29:07] ugh [15:29:11] soo.... [15:29:17] it landed in my spambucket [15:29:22] i want to move it to my inbox [15:29:33] but i don't want to train as not spam [15:29:35] ... [15:29:38] :-( [15:31:39] Jeff_Green: got it? [15:32:21] seems relatively intact [15:33:09] looking [15:33:57] got it but the original headers are gone. which mail client are you using? [15:34:17] did you get 2 copies? [15:34:20] check your spambucket [15:34:35] oh good pt. checking gmail [15:38:43] ok i found it [15:38:54] only 47K messages in my spam folder :-) [15:39:00] hah [15:39:09] oh I lied. 42.2K [15:39:19] don't you have an OTRS account? :) [15:39:47] i wonder how you get so much. probably from aliases [15:39:53] i have 282 msgs in spam atm [15:40:07] i think it's mostly from root@ etc yeah [15:52:44] jeremyb_: spelunkin, jfyi [15:53:53] k [15:55:31] hrm. got anything newer? logs don't go back this far [15:55:59] sure [15:56:02] jeremyb_: i've got basically 3/25 on to work with [15:56:24] that's from after then already [15:56:49] 13 Mar 2013 01:02:56 +0000 [15:56:50] oh, nvm [15:57:05] transposed digits [15:57:24] can you log in or i need to rebounce? [15:57:57] to otrs? i can [15:58:00] yeah [15:58:13] https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketPlain&TicketID=6919447&ArticleID=8150450 [15:58:36] Subject: AUMF Hunger Strike Called to Demand Repeal of 2001 Authorization for Use of Military Force PL 107-40 [15:58:39] gah [15:58:43] Date: Thu, 28 Mar 2013 12:17:53 -0400 [15:58:47] same sender [15:59:12] yep [16:14:29] jeremyb_: i'm not 100% sure I'm understanding the exim config on mchenry, but I see two things that might result in it not tagging messages. there's an exemption from spamd checks for some recipient addresses, and there's a threshold below which is won't tag, which appears to be 1 [16:15:00] Jeff_Green: can we try to replicate that exim in labs? [16:15:33] either someone with access does it or maybe you can just sanitize a little and give a tarball [16:16:08] jeremyb_: we can but it's a bit of a project to get it tweaked to be useful [16:16:44] also I can confirm my understanding with ma.rk [16:16:48] Jeff_Green: also, as long as i have your attention... can you do a `mysqldump -d` on the whole db and send it to me? (-d means leave out the data) [16:17:21] i can never remember which parts are ma.rk vs. Tim [16:17:43] jeremyb_: the more interesting part of the spam filtering happens on williams. I believe OTRS has a plugin of sorts to talk to spamd on the local host [16:17:59] hrmm, ok [16:18:21] that's something we'll talk about with martin as part of setting up the upgrade [16:18:47] https://wikitech.wikimedia.org/wiki/OTRS [16:21:21] jeremyb_: i think that mchenry at best does only a high-threshold filter, and williams has the filter which is tuned by feedback-training from OTRS users [16:21:59] if feedback is happening at all... [16:22:57] the doc says it's supposed to be training from the Junk queue [16:23:38] but also filtering to the Junk queue. that seems odd to me--ordinarily you'd train by spam+ham folders and filter to a separate junk folder [16:23:51] b/c bayesian filters need some positive feedback too :-) [16:24:57] ugh, eventually i'll get this right (the wikitech page) [16:25:12] \o/ [16:25:56] hrmmmm, yeah [16:26:11] it could just filter out the stuff that started in junk though [16:26:21] but that doesn't help with ham [16:26:27] true [16:26:58] imo this is all a logical part of the project to upgrade [16:27:15] there may be changes to how all this works within OTRS too, I don't know [16:27:54] sure [16:28:31] * jeremyb_ just wonders how hard it would be to replicate the relevant exim bits from /etc [16:28:45] i guess mchenry's probably entirely in puppet except maybe aliases [16:28:50] imo it's not that valuable [16:28:51] williams is not though [16:29:04] mchenry's not intended to do the filtering for OTRS [16:29:17] right [16:29:56] anyway, i guess a lot depends on how fast the upgrade is [16:32:39] i wonder...the spamassassin db's are quite large for otrs [16:33:14] if it were my personal account that was filtering less than well, I would blow them away and let it start fresh [16:33:54] it's using auto-whitelist, and I've found that lacking in the past [16:35:08] i don't know much about spamassasin [16:36:08] * jeremyb_ would tweak stuff in labs and test and rinse/repeat if he could get something relevant set up [16:40:20] jeremyb_: it's pretty straightforward to set up and test spamassassin, the trickier part is probably finding a way to simulate a realistic mail flow [16:40:44] Jeff_Green: i'm not as worried about mail flow... [16:40:59] I don't mean volume, I mean content [16:41:03] i know [16:41:06] k [16:41:14] trusted_networks 208.80.152.0/22 91.198.174.0/24 203.212.189.192/26 [16:41:24] mchenry.wikimedia.org 208.80.152.186 [16:41:30] Jeff_Green: i think some of my queues are probably not an NDA issue to have on labs (or at least could have on my own machine) [16:41:37] like the NYC chapter queue [16:41:55] i could copy down a few days of mail and replay it [16:41:59] i wonder...should mchenry be in trusted_networks... [16:42:01] b/c it is [16:42:06] just thinking out loud [16:42:09] a chapter queue that requires no confidentiality? O_o [16:42:30] jeremyb_: you'd have to truncate the headers to make it look just like mail that's coming in live [16:42:52] Jeff_Green: ok [16:43:18] Nemo_bis: i could make a crypted partition locally and throw away the key when i'm done... [16:43:31] jeremyb_: as far as you know, did spam filtering *ever* work well for this OTRS setup? [16:43:52] Jeff_Green: i have no clue. i do recall vaguely that junk used to get emptied magically [16:43:57] but don't quote me [16:44:09] orly. interesting [16:44:35] imo that's one of those things that should auto purge messages >14 days or something [16:45:23] yes, there's been a period when spamassassin worked [16:45:26] Nemo_bis: the point was i wasn't talking about info-* or permissions or a really sensitive queue... [16:45:36] I remember hearing tales from OTRS volunteers about it [16:45:50] jeremyb_: I've no idea how a chapter queue is less sensitive [16:46:07] WMIT's queue is so sensitive that we don't even put it on WMF servers :p [17:14:55] jeremyb_: (01:14:31 PM) Jeff_Green: !log moved aside williams:/opt/otrs-home/.spamassassin/auto-whitelist and restarted spamd to test whether the bloated AWL is resulting in poor filtering [17:15:12] jeremyb_: if it's less than an improvement, I can move it back and restart [17:16:11] RD: ^ [17:16:36] i've seen cases where untrained auto-whitelisting went awry and essentially whitelisted the world [17:16:54] how does auto whitelisting work? [17:17:12] jeremyb_: in my experience...badly :-P [17:17:13] sec. [17:17:31] http://wiki.apache.org/spamassassin/AutoWhitelist [17:18:56] greg-g: I doubt https://bugzilla.wikimedia.org/show_bug.cgi?id=36195 is what you're looking for [17:18:59] so, i guess we don't know if it's really training though? or do we? and it's almost certainly not doing ham [17:19:52] Nemo_bis: what I'm looking for? I am doing a review of bugs with the "platformeng" keyword, and that one is one I haven't looked at, and I was trying to figure it out :) [17:20:19] jeremyb_: right. I believe it was initially set up to train but I haven't gotten far enough into the otrs config to determine whether it's been somehow broken [17:20:53] greg-g: it's about stale entries in pages like Special:WantedPages [17:21:13] jeremyb_: its generated a new auto-whitelist file as expected now that some mail has been checked [17:23:52] Nemo_bis: only that? is that what the dfn switch is for? I'm just curious because "refreshlinks" is a part of the wikidata cronjob that happens [17:24:16] refreshlinks is not refreshlinks.php [17:24:39] I'm assuming that refreshlinks.php is for more than just one namespace? [17:25:18] sorry, my "refreshlinks" quoted above isn't an exact quote, more of a descriptor in my head [18:24:19] mark_: you may want to change your /nick too. there's another mark so that could get confusing [18:24:22] 01 18:17:50 < mark_> pageloads are really slow in CH (Switzerland) right now [18:24:26] 01 18:17:59 < mark_> at least on the citycable network [18:24:28] 01 18:18:07 < mark_> is anybody else experiencing this? [18:24:57] is there a specific page that's slow? [18:25:11] looks like it's mostly things loading from bits [18:25:21] hang on, have to do something dinner related [18:25:26] will change my nick shortly [18:25:33] k [18:25:40] are you on ipv6? can you try with and without SSL? (HTTP and HTTPS). also try other browsers [18:27:13] (or open your console and find which bits URL in particular is slow. is that one always slow?) [18:53:53] OK, so I've verified that it's a problem at my provider [18:54:53] Swisscom loads all WMF sites fine. it's either at cablecom or citycable [18:55:15] traceroute suggests cablecom [18:55:25] in any case, nothing you or we can do [18:55:30] sorry to bug.. [19:06:26] WV_mark: hey, might be able to do some routing switching ? [19:06:38] on our side [19:06:46] though calling your provider would also help [19:06:58] well, in fact it seems to be clearing up now [19:07:10] I think others have already complained to them [19:07:38] cool [19:07:39] :) [19:07:59] thanks again, and once again, I hope I'm not making a pest of myself [19:08:19] nope, is all good [19:08:41] i'm in another meeting so not monitoring irc enough, but i do like to know whenever routing issues happen [19:09:49] thanks LeslieCarr [19:12:58] Updated it pre-emptively for next time ;) [19:13:00] not merged yet [19:57:25] Jeff_Green: is there some way to read the AWL DB or monitor what's going into it? [19:57:32] * jeremyb_ is about to spam a msg... [20:20:39] boom? [20:20:49] ? [20:20:49] jeremyb_: afaik pretty much everything goes into it, and there's some scorekeeping that goes on. I'm not sure about monitoring it, I just dumped strings from it and look at that [20:21:10] Krenair, I get [20:21:11] Sorry! This site is experiencing technical difficulties. [20:21:11] Try waiting a few minutes and reloading. [20:21:11] (Cannot contact the database server: Unknown error (10.64.16.6)) [20:21:57] Seems back up maybe [20:21:58] MartijnH: url? [20:22:07] jeremyb_, no longer [20:22:11] it was en.wiki main page [20:22:20] jeremyb_: gwicke experienced a problem on https://en.wikipedia.org/wiki/Castle_Grayskull - seems it may have been sitewide [20:22:26] and a user talk on en.wiki [20:22:45] k [20:22:51] it's working for me again though, before I set off too much panic [20:22:55] * jeremyb_ wonders if that's a parsoid guinea pig [20:22:55] jeremyb_: how do you plan to spam it? [20:22:56] :) [20:23:09] Jeff_Green: i just meant to click the spam button on a ticket [20:23:17] Jeff_Green: you could diff before/after or something [20:23:26] Jeff_Green: but then there's the cron job too [20:23:43] oh. i *think* awl is based on the mail stream not sa-learn [20:23:59] * jeremyb_ is so confused [20:24:06] anyway, have to do other stuff for a bit [20:24:06] i mean, i think it's keeping track of how it real-time scores messages from repeat users [20:24:20] huh [20:37:37] jeremyb_: It's not a parsoid guinea pig to my knowledge, I was just showing gwicke an article and he reported an issue [20:37:44] heh [21:28:51] hello [21:29:35] anomie ? [21:30:05] hexasoft ? [21:30:14] is there anything "special" in the way scribunto-doc-page-header content is interpreted? [21:31:04] in this content I added a call to our "module documentation" (which prints what needed when called from module or module doc [21:31:21] but in the particular case of doc page categories are ignored [21:32:02] all the rest is well managed (the text, headers and links the module outputs) but no cat [21:32:27] I even replaced the module output by a single [[category:XXX]] without any effect. [21:32:30] hexasoft- It's added as a message, not as part of the page parse, so categories and such won't get considered [21:33:20] anomie: ok, I guessed something like that [21:33:36] thanks. we will have to add the category in the doc itself. [21:49:34] binasher: from the mysql documentation it seems like I can create a trigger on a replicated table only on a slave (and have it update a non replicated table on that slave); is this correct? [21:50:15] mwalker: are you asking for some project that doesn't involve the prod wiki databases? [21:50:37] sorta; I'm asking about the fundraising databases [21:51:21] and exploring my replication options to samarium which will only hold sanitized data [21:52:17] so my current idea is use db1025 (which is slaved from db1008) that has some triggers to update a federated table mounted on db1025 from samarium [21:54:21] (I'm thinking federated because data must always be pushed from the frack; so I can't directly slave samarium to a db in the frack) [21:54:51] where is that written? [21:55:12] probably not anywhere; but it's the mantra jeff and I have been attempting to follow [21:55:30] ok, so it's not PCI or anything [21:56:18] no; not formally -- but it has ramifications onto PCI; we have to ensure that everything that leaves the PCI scope does not have CC data in it; and it's easier to do that pushing data than having it pulled [21:56:52] right. i'm just thinking if it can pull from a place that's known to already be clean... [21:57:54] sure; but replication works by having the slave connect to the master and read the master's binlog; which'll have all the transactions including the ones we don't want to replicate [21:58:25] so if the samarium gets owned they have a nice access port into the frack [21:58:32] unless there's an intermediate that makes a redacted log [21:58:39] and you pull only the redacted log [21:58:51] sure; but that involves another server [21:58:59] and that seems silly to me [22:00:37] mwalker so on db1025, you'd have something like a before insert trigger on fundraising.dollardollarbills that does an update on samarium.table where that's a federated table that is actually on samarium? [22:01:22] yep yep [22:21:15] mwalker: sorry, got distracted. but yep, that sounds like it should work but with one big caveat [22:21:24] dun dun dun! [22:21:41] acutally, if you use an AFTER trigger, it might not be a problem.. hmm. [22:22:14] but in my experience, if a query executed by a trigger fails, the calling transaction is rolled back [22:22:31] so if the db on samarium was down, db1025 replication would break [22:24:34] that sounds like what you would want to happen actually in most cases [22:24:51] any other thoughts on how to accomplish the same thing without potentially breaking replication? [22:25:42] mwalker: indeed, it at least ensures that the db on samarium shouldn't ever get out of sync with triggered updates [22:25:58] but how important is db1025? is it queried directly by fundraising systems? just backup? [22:26:05] it's just backup [22:26:12] we do some analytics off of it as well [22:26:21] but so long as we know replication has died we should be good [22:26:39] Jeff_Green: if replication on db1025 were to go down; what else besides ^ would happen [22:30:35] mwalker: could you come up with a procedure to remove the trigger on db1025, then later bring samarium back up to date and re-add the trigger? [22:31:22] mysqldump, import to a PCI box, delete some stuff, dump again, import to samarium? [22:31:35] ya; that's about what would happen [22:44:35] mwalker: how large is the table on samarium? small like a row per project with running totals? [22:46:20] more like a row per project/language/country per hour or day or something with totals, averages [22:46:32] so; couple 10's of thousands of rows I guess [22:46:38] fairly small [22:47:00] haven't totally thought out what data we're going to replicate [22:48:54] mwalker: you could just update a local table on db1025 via triggers and use pt-table-sync (http://www.percona.com/doc/percona-toolkit/2.1/pt-table-sync.html - percona toolkit is on all of our db servers) running via cron either directly on db1025 or a pci complient server to efficiently update samarium with changes from that table every 10 minutes or whatever [22:50:29] see the algorithms section of the pt-table-sync doc, you can ensure the table schema will allow efficient syncing of just changes [22:50:47] yep; I was thinking about that too -- and it might make sense for things that we really have to massage to make them safe [22:51:46] why do you need to mix clean and dirty in the same table? :-) [22:52:14] partly because of civi; but mostly it has to do with time based inference [22:52:32] like; here in the US we can aggregate every 5 minutes and you'd never be able to determine a distinct transaction [22:52:42] but in smaller countries that becomes a big deal [22:53:02] we don't want 3rd parties to be able to identify how much a particular individual gave [22:54:03] hmmmmmmms [22:55:11] still. you could have extra table(s) to summarize and those extra table(s) could be populated by cron and could live on all DBs (including master) [22:55:40] anyway, maybe not worth it [22:56:53] it's all good thoughts [23:12:03] gn8 folks