[00:03:56] that is correct. Thanks again, Krenair [10:06:52] lexein: sorry, I don't see an relation between the two [10:06:56] *any [10:10:31] Well, with IP-locked bot accounts (meaning a username and password are locked to an IP address), and no edits possible without matching IP address AND credentials, then no stray bots could ever edit pages without being immediately detected. The blocked RotlinkBot could never have been deployed via proxies to swarm edit. [10:11:07] detected -> denied access [10:12:37] nah, creating accounts is easy [10:12:46] Making regular Wikipedia editor accounts blocked from API edits helps. Human registered editors can edit from any IP address, but cannot perform API operations. [10:13:01] that bot is very useful, let's see if there's the code somewhere [10:13:16] ? [10:13:45] Registered Bot accounts would be by requested permission only, and would be IP-address locked. [10:14:08] Bot accounts cannot perform web (non API) edits. [10:14:16] (in my proposed scheme) [10:14:34] you seem to be very confused about bots and API on Wikimedia wikis, I suggest you to read more [10:15:50] Well, wait a minute. I've done a bit of reading, that's why I'm here. [10:16:56] If you're saying that direct API requests and Web access are intimately and unavoidably linked, well then my idea is fucked. [10:18:30] My scheme requires that there be some way to differentiate users from bots at the account level, and to allow and deny types of access to each, and to IP-address lock different types of account. So that a particular approved bot can only occur from a particular IP address. [10:18:52] approved bot edit, I meant [10:20:28] I'm quite experienced in microcontroller programming, but clearly not with web apps. [10:23:39] lexein: "direct API requests and Web access are intimately and unavoidably linked" -- they are. You can block the API, but you don't need an API to make automatic edits. [10:23:53] lexein: you *can* block API editing for certain user groups, if you would want to. [10:26:52] I think I may have merely dived too deep. The goal is to, by using a combination of best-practices authentication, and IP-address locking of bot accounts, to stop cold the type of swarm-edit performed by the RotlinkBot through proxies. How exactly that's done, I don't care. But I don't want swarm edits to be possible anymore for unapproved bots. So I naively thought there should be some... [10:26:53] ...way to technically implement that. [10:28:02] lexein: basically, there are two ways to edit wikipedia automatically: either you use the API, or you use the normal web interface using screen scraping [10:28:43] the latter is what every bot did until the API was introduced in 2006-ish [10:31:08] and it's exactly the same thing normal users do [10:31:11] Right, I understand. I hoped that "bot" accounts could be prevented from screen scraping, forced to use the API, be IP-address locked, AND require username/password to perform edits. Bot accounts would not be automatically created, but would be forced to endure a 3-day delay, etc. whatever. [10:31:13] 'fill in text here, click this button' [10:31:25] Define 'bot account' [10:31:38] How would the server know whether it's talking to a user or a bot? [10:32:01] A new permission bit is associated with each created account. Either it's for human editing, or it's for bot editing only. [10:32:14] So now I lie, and tell you I'm human. [10:32:29] Then you can't perform any edits via the API. Simply not allowed. [10:32:35] So I use screen scraping. [10:32:56] If the edit rate is high, then you're a bot, and fuck off. [10:33:18] Which is exactly the same thing that happens now. [10:34:38] It didn't work stopping RotlinkBot because it was (somehow) editing without passing through Captcha, (means API, right?), and it was doing it through proxies. What I don't know is if it was doing it anonymously, or with account credentials. [10:35:30] Anonymous editing via screenscraping forces Captcha, is that right? [10:35:58] No. Editing as anonymous user and adding an external link might force a captcha. [10:36:06] But this is not different between the API and normal web users [10:36:23] Ok, thanks, right. Rotlinkbot always tries to add a link [10:36:35] *augenroll* [10:36:46] But given you're calling the bot by name, it was apparently editing logged-in. [10:37:10] Sorry for making your eyes roll. I apologize. [10:37:42] Well, Kww only listed IP addresses in his case against Archive.is at [[WP:Archive.is RFC]]. [10:38:09] And he asserted that it was always RotlinkBot [10:38:46] The RotlinkBot account itself was blocked, and it was somehow still performing edits elsewhere via proxies, lots and lots of them. [10:43:23] So if it was screenscraping, then it was getting through Captcha with human help, or maybe a captcha solver. [10:46:48] Listen, I'm sorry, I really don't mean to be wasting people's time, but it just somehow feels like there's a front-end fix for botnet editing. I don't want to get into just repeating myself, because I'm sure I said what I meant at least once up there. [10:47:25] lexein: sprichst du deutsch? [10:47:39] Nur ein bischen (sp?). [10:48:09] i see the bot (RotlinkBot) have an account on dewiki too ;-) [10:48:42] Heh - maybe that's how some of it was happening. [10:49:33] Ok, so enwiki now has filters to prevent IP editors from adding links to archive.is. Does dewiki? [10:50:21] lexein: I just tested what happens if I try to add an external link via the API as anon user. It gives me a captcha. [10:52:06] Ok, good. So does that mean that anon users trying both API and screenscraping methods will ALWAYS get a captcha? [10:53:20] yes [10:56:23] Well, that's something. I'd really like to slam the door on unapproved bots for a reason I will now state: I think that if RotlinkBot could have been blocked instantly by (some variant of my ideas above), then the [[WP:Archive.is RFC]] would never have happened, and its reputation would be intact. [10:56:57] It's a weird argument I know, wishing things were different. [10:57:15] Thanks for putting up with my crap. See y'all later. [10:57:47] lexein: and as I have told you above, your ideas would not have helped. [10:58:02] lexein: only the 'block someone if an edit is made quicker than XXX' might have helped [11:12:04] Thanks again.Whatever we can do to force bots into one corner of the problem space, we should do. All the best. [11:31:35] https://en.wikipedia.org/wiki/Category:Days_of_the_year oh, that's interesting [19:49:36] Reedy: if you could run this script, you'd save Erik Zachte some headaches :) https://bugzilla.wikimedia.org/show_bug.cgi?id=56383 [19:49:43] should complete in few seconds for such a wiki [19:50:26] will do when I'm on a computer [19:51:27] kiitos [19:56:01] I need a SSH key on my phone [19:57:43] I don't think that'll pass muster in Ops [19:58:03] rootphone! [19:58:12] if its kept in encrypted storage... [19:58:24] Meg [19:58:51] its not like I need to do it frequently. probably nopoint [23:05:49] weird [23:05:55] just got "An error has occurred while searching: Pool queue is full" on enwiki search [23:06:05] those are pretty common [23:06:14] what causes it? [23:06:33] no clue :) [23:06:51] uh, is there a bug? [23:07:25] I dont think anyone ever filed one, you just try your request again and it works. [23:08:01] PoolCounter [23:08:10] Too many people doing the same thing [23:08:24] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Pool_queue.3F [23:08:30] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_113#Pool_queue_is_full [23:08:37] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_114#Pool_queue_is_full [23:08:38] Though, I note it only seems to be search related [23:08:52] ^d: fix that, plz ^^^ ;) [23:09:02] RESOLVED KILL LUCENE [23:09:19] it has to be fixed [23:10:19] <^d> Remove all the limits and let the apaches kill lucene ;-) [23:14:40] <^d> TimStarling: There's been a few increased reports of the pool queue getting full for searches (mwsearch, not cirrus). Think it'd be ok to raise the limit a tad? We were pretty conservative with our initial numbers iirc. [23:15:25] <^d> s/limit/maxqueue/ [23:16:01] What is the actual value? [23:16:04] (currently) [23:16:29] 400 [23:16:40] <^d> 78 workers, 400 maxqueue. [23:16:42] https://noc.wikimedia.org/conf/highlight.php?file=PoolCounterSettings-eqiad.php [23:17:25] https://gerrit.wikimedia.org/r/#/c/55777/ [23:18:49] is that per host or overall? [23:18:59] must be per host, right? [23:20:42] ah yes, "6 Lucene * 800 max queue = 4800 processes, which is the number of regular apache processes in eqiad." [23:20:58] <^d> Yeah that was how we arrived at 400. [23:21:53] so if it was 800, it would be on the edge of an overload if all lucene hosts locked up at once [23:22:04] <^d> Yeah, I was thinking like 500 or 600 at most. [23:22:09] <^d> Just give it a little more headroom. [23:22:39] I would be fine with 600, I think [23:22:55] usually they should lock up one at a time, since the index updates are staggered [23:30:22] <^d> greg-g: Ok, so if you're wanting to respond on that VPT thread: I've upped the max queue for the pool counter so the error should be less likely to happen again. [23:30:43] ^d: sweet, might remember to do it tomorrow [23:30:49] otherwise StevenW can :) [23:30:53] * greg-g runs to bus [23:31:07] <^d> And like I've said before on the subject, it's nothing to panic about really. Only time to worry is if it persists. [23:31:19] <^d> The message is a little panic-inducing, which may be my fault. [23:33:05] which message? [23:33:14] "An error has occurred"? [23:34:27] <^d> "An error has occurred while searching: Pool queue is full" [23:34:38] <^d> People seem to panic :) [23:34:40] ori-l: Something's broken [23:34:41] ori-l: Something's broken [23:34:44] ori-l: Something's fixed [23:34:45] GG [23:35:03] <^d> Nobody can say the error isn't informative ;-) [23:35:50] When the message appears, is search broken? [23:36:23] <^d> Not usually [23:37:15] can we make it not appear when things aren't broken? [23:37:21] view-pool-error now says "Sorry, the servers are overloaded at the moment. Too many users are trying to view this page. Please wait a while before you try to access this page again." [23:37:37] <^d> Yeah we need a nicer error like that :) [23:37:39] which probably causes less alarm than "pool queue full" [23:39:45] Happy TimStarling Day (apparently) [23:40:17] yay me [23:40:33] Does TimStarling get TimStarling day off? [23:41:13] unfortunately not [23:42:40] <^d> *sigh* why did I write this crappy status handling code in special:search [23:42:58] Reedy: you get to ask him to merge a change -- any change -- and he can't legally refuse [23:43:59] it's in the charter somewhere [23:44:14] Does he have to merge it like *now*? [23:44:24] Or can he say go away and fix these 500 things? [23:44:40] no, it's instant [23:48:01] <^d> ori-l: Nothing about gerrit is instant. [23:53:22] Instant Rebase - just add water! [23:53:56] Bugzilla can do an update quicker than you can alt tab back into your irc client