[01:45:52] this is weird [01:45:59] i put a dewiki page into google translate [01:46:14] and when viewing it i saw a notification that said: "Central login You are centrally logged in as Legoktm. Reload the page to apply your user settings." [01:46:41] the dewiki page i put in was http:// though, and im logged in over https [07:42:32] hi. I have a problem with python on wmflabs. Case is as follows: mkdir témp (note: nonASCII char), cd témp, run python, import os, os.getwcdu() --> throw UnicodeError [07:42:51] can someone try? [07:43:34] typo: os.getcwdu() [07:44:04] I don't undrtand why pyhon does not regonise which encoding scheme to apply [07:44:55] lego ^ [07:45:48] >>> os.getcwdu() [07:45:49] u'/data/project/legobot/t\xe9mp' [07:45:54] Mpaa: works for me? ^ [07:46:39] legoktm, I connect from a windows machine, are you working with linux? [07:46:50] OSX, so basically yes [07:47:23] Mpaa: you had better not be using Notepad [07:47:31] (use Notepad++) [07:48:00] I think it depends on how I connect to wmflb from windows [07:48:27] os.getcwd() -> '/home/mpaa/t\xe9mp' [07:48:55] hm. [07:48:55] when I try unicode, it attempts to use utf-8 and fails [07:51:17] print os.getcwd().decode('windows-1252') -> /home/mpaa/témp [07:52:29] I don't get why, as I am running everything on a linux system in bastion [07:53:15] where am I wrong?? [07:55:54] legoktm, can you try os.stat(os.getcwdu()) and see if that works? [07:56:21] posix.stat_result(st_mode=17917, st_ino=40896107, st_dev=31L, st_nlink=2, st_uid=40004, st_gid=40004, st_size=4096, st_atime=1379663126, st_mtime=1379663121, st_ctime=1379663121) [08:10:08] Mpaa: at the filesystem level, filenames are just binary bytes, not a particular encoding. It's up to the user/app to decide on an encoding/interpretation. If you input windows-1252 byte sequences, you can't expect them to decode as utf-8 later. There is no magic conversion in any of this. [08:23:21] bblack, problem is I am not doing anything [08:23:33] I just created a dir [08:23:45] and run 2 python commands [08:24:00] not a line of code from myself or external files [08:24:45] no decisions on encoding [08:25:00] unless they are hidden in some system file somewhere [08:25:44] Mpaa: you created the directory on the commandline by typing the name, right? And you're actually typing in a terminal emulator on Windows, which is sending the character in 1252 encoding [08:26:26] which is fine in the general case: your terminal sends that encoding, and when you type "ls" it also interprets it back correctly for you. But the byte sequence on the filesystem is in 1252, not utf-8. If you then ask python to see it as utf-8, that won't work. [08:27:16] ("it also interpets" should have read "your terminal software interprets" for clarity) [08:28:17] bblack, os.getcwdu() does not ask for an encoding scheme [08:28:43] I think I am stuck as I intterface a linux world with a windows console ... [08:29:05] Mpaa: I don't know the inner details of os.getcwdu(), but the filesystem doesn't provide or store character-set metadata for a filename, and whichever random character set you used can't be guessed in the general case. [08:30:55] Mpaa: whatever terminal software you're using on windows, is there a preference/setting to tell it to use utf-8? [08:32:38] bblack, I am using the git bash to ssh wmflabs, I think it based on cygwin but I have no clue [08:33:32] TERM=cygwin [08:35:14] well, you may be running a cygwin binary of bash, but the question is what software that's running underneath. In other words, what is providing the terminal window itself. [08:36:48] bblack, frankly no clue :-( [08:37:24] just use putty tbh [08:38:04] yes, that! [08:38:18] huh, you're using the ssh binary on windows or something? [08:38:21] * MatmaRex has no context [08:38:32] abandon all hopes of any non-ascii characters being transmitted correctly, then [08:38:34] http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html [08:39:07] I'll give it a try [08:39:45] putty will do utf-8 by default [08:40:46] thanks, hope you will not hear from me again :-) [09:34:47] bblack, MatmaRex, legotkm, fine with Putty, thanks again [09:35:02] :) [11:02:59] apergos: hello [11:03:31] hello [11:03:37] let's see if there's a parent around [11:03:56] (I feel like an underage kid trying to get into a movie when I say that) [11:04:30] :-) [11:05:52] hi. [11:06:57] hmm pinged but no answer, let's go ahead [11:07:18] the fiwiki job completed today, but [11:07:20] 12910000 revs, 887820 pages, 110222 MB [11:07:21] terminate called after throwing an instance of 'std::ios_base::failure' [11:07:21] what(): basic_ios::clear [11:07:35] I don't know a nice way to determine if the job completed normally or not [11:07:44] (in fact how can we tell that?) [11:08:10] anyways can you have a look and see if there's something when you close files etc that could cause an issue? [11:08:15] well, sicne it threw an exception, i'm pretty sure it didn't complete normally [11:08:39] mm well [11:08:58] how many pages did we say this has.. uh [11:09:27] 1119703 is the upper limit. [11:09:44] sme will have been deleted, no idea how many [11:10:09] I could convrt to xml and grep for page ids but that might take a long time [11:10:16] given how long this took to run [11:10:35] when i look at the uncomressed XML file, it has 110222 MB, so it has been read completely [11:10:52] so you're right it's something at the very end [11:10:55] ok [11:12:34] can I ask, when you do reads from stdin, are thos eline by line or what are you doing? [11:12:59] well stdin or an xml file, as the mechanism should be the same [11:14:37] the reading is not line-based [11:14:57] i use istream.read [11:15:59] which reads some specified count of bytes [11:17:36] is there an easy way to find out what that number is? [11:17:45] i'll try reading from pipe on a small wiki, maybe that have the same problem [11:18:03] well I"m investigating it for another reason [11:18:06] not for the exception [11:19:45] it looks like it's 2048 bytes (at least on Windows), the XML library I use decides that [11:20:25] yeah: #define XML_BUFFER_SIZE 2048 /* maximum number of characters to buffer at once */ [11:21:14] mmm [11:21:44] ok, I"ll keep that in mind, thanks [11:23:29] but that's how i read from the istream, it could have its own buffering [11:24:25] ok I will hunt around, thanks [11:25:25] why does that matter to you? because of the high IO? [11:25:47] well it's a possible angle, yes [11:27:41] apergos: nolw that the article count uses the links table, it's very quick to update the Special:Statistics count btw [11:28:09] currently says 887 978, probably not more than a few hundreds possible mistake [11:28:44] ah Nemo_bis, are you uh [11:29:12] of course that only gives a closer upper limit [11:29:37] wikiteam's "validator" is a grep count of vs :P [11:30:34] having a sensible way to check how valid a dump is would be nice (but I guess it wouldn't help your binaries anyway) [11:30:38] https://gerrit.wikimedia.org/r/#/c/84632/3/manifests/misc/maintenance.pp [11:30:43] ? [11:31:23] no, we can check the bz2s to see that they at least complete [11:31:29] but this new format isn't checkable [11:31:29] apergos: how about that, is it broken? [11:31:32] yep [11:31:44] so it's quick to tell you why [11:32:02] that = the cronjob you linked [11:32:17] yes this format is harder to check of course :) [11:32:19] it tries to set up these corn jobs for ach of s1@11, s2@12 etc right? [11:32:23] can fail in much smarter ways [11:32:35] *cron jobs [11:32:37] apergos: yes but in most cases it will be no-op right [11:32:41] well [11:32:46] it has to see about that first [11:32:52] look at the cron jobs right above the new ones [11:33:03] see how they have titles that end in ${name} ? [11:33:13] of course, I changed that on purpose [11:33:29] hmm are you saying the last one may overwrite the first [11:33:34] this is so every cron job has a different name, otherwise puppet says 'oh no, it's the same name and I am being asked to define it twice, now what, let's die' [11:33:38] and it fails to run [11:33:57] did puppet die? [11:33:59] so puppet on hume where these jobs are, is broken right now [11:34:07] oh [11:34:21] you can't define the same resource with the same name twice, puppet will tell you off and then barf [11:34:34] er declare. anyways. [11:34:48] isn't there a way to tell it to ignore the duplicates [11:34:49] Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate definition: Cron[cron-updatequerypages-lonelypages-s1] is already defined in file /etc/puppet/manifests/misc/maintenance.pp at line 487; cannot redefine at /etc/puppet/manifests/misc/maintenance.pp:487 on node hume.wikimedia.org [11:34:53] no. this is a feature. [11:34:58] puppet is designed to do this. [11:35:08] puppet is just too choosy >.> [11:35:09] otherwise which declaration does it take? [11:35:14] the first [11:35:24] or the last, they all rare the same on purpose [11:35:25] what is 'the first'? puppet is a declarative language [11:35:30] *are [11:35:32] there's not a notion of inherent order [11:36:05] stupid puppet who can't live with randomness [11:36:38] the easiest way to fix it is to make those monthly on en.wiki :) [11:37:09] ok so if there's not a patch by later today I'm going to revert this [11:37:16] so that puppet at least runs over there [11:37:36] ok but what approach do you want me to follow [11:37:36] and you'll be able to resubmit in a way that makes puppet happy and gets the jobs run [11:38:05] Sean also made the other queries monthly so maybe we can do the same here? [11:38:08] what did people agree on for these jobs? [11:38:18] cause I've not followed it for awhile [11:38:31] so I shouldn't just dictate to you 'run them this ften' [11:38:41] that he's watching them (assisted by a query killer) so it's no big deal to remove them if they cause damage [11:38:44] the one thing I can dictate is 'don't break puppet or I'll have to revert it' [11:38:50] sure :) [11:39:01] well I can just move it to another class [11:39:41] how does that help? [11:40:49] hey Nemo_bis, you might want to see https://en.wikipedia.org/wiki/Wikipedia:VPT#Top_5.2C000_templates [11:41:13] as long as the cron jobs have the naming scheme they have (not varying by the arg passed, i.e. 's1@11), puppet will reject them [11:43:36] it seems like there are two things going on in that list of jobs [11:43:44] one is running it on a bunch of different clusters [11:44:00] one is running it in the same cluster a bunch of times (not sure what that's about) [11:44:03] i.e. [11:44:17] 's1@18', 's1@19', 's1@20' etc [11:44:54] yep [11:45:02] it's the 6 reports for en.wiki only [11:47:17] ok well with this approach you're going to run every s1 job on all 6 days [11:47:27] if puppet would let you, that is [11:49:01] yes [11:49:08] those are quick [11:49:23] why not just run them on one day? [11:50:07] well, they are quick compared to those taking a day [11:50:38] they are still able to raise lag on a slave by some decimal of second [11:50:58] having them run on the 11th, then on the 18, 19, 20, 21, 22, 23 seems a bit... odd to me [11:51:23] if you mean you want to run each new crn job separately on a separate day I can understand that [11:51:35] but this runs them all on the 11, then all on the 18, all on the 19 etc [11:51:44] (or it would if puppet etc) [11:52:41] ah no that's not what I want [11:53:06] let's prepare a patch before pasta starts cooking [11:55:06] ok well I am happy to stare at the patch (but since springle seems to be the one doing +2 I think you should add him and have him approve) [11:55:34] svick: how is the commenting coming along? [11:56:11] apergos: slower than i would want, but i think i'll have it done by monday [11:56:16] apergos: can I add another define to the same misc::maintenance::updatequerypages (and declare at the end) or does it have to be on another one [11:56:29] svick: ok cool [11:56:41] you can commit in pieces too, you don't have to do all the comments in one big commit :-D [11:57:12] Nemo_bis: I'm not sure what your other define would look like so I don't know the answer [11:57:23] and 'another one' what? class? [11:57:28] right [11:58:02] i'm looking at the exception, i now understand what's causing it, and trying to figure out how to best solve it; so i should have a fix for you soon [11:58:12] sweet (what is the issue anyways?) [12:00:43] that c++ streams and exceptions behave in way i didn't expect; so with my current setting, trying to read 2kB when near the end of a stream throws an exception [12:00:52] ahh [12:00:58] because there's nt 2k left [12:00:59] heh [12:01:18] apergos: yes, other class; well, let me upload it [12:02:02] I think you have to put your corn jobs in another class because therwise they will get run on s1 and on s2 and 3 etc, but you said they are en only [12:02:04] *cron jobs [12:03:31] hmm so that means that the last little bit (who knows how much) f the xml input didn't get read properly, which means something probably didn't get written right, who knows what [12:03:32] oh well [12:03:51] I won't use that output to check contents then [12:04:41] apergos: https://gerrit.wikimedia.org/r/#/c/85192/1/manifests/misc/maintenance.pp [12:04:46] looking already [12:09:41] why is it called updatequerypages-enwiki-disabled ? how do you know it's disabled or enabled? [12:13:06] Nemo_bis: ? [12:17:14] ah you're working on it still, sorry [12:22:07] apergos: should be fixed now [12:22:12] looking already :-) [12:23:40] okey dokey [12:37:00] apergos: the additional disabled pages [12:37:19] but sure, let's change name too [12:38:06] oh... uh [12:38:11] yeah the name wasn't clear to me [13:09:30] Nemo_bis: this looks ok to me, I'll +1 it, don't forget to add springle [13:16:52] apergos: I didn't change the weight on DB, just fixed the syntax [13:17:24] uh huh [13:17:27] of course more eyeballs on the puppet syntax don't harm but I suspect he trusts you on that [13:22:27] I'd rather he oversee it since it it explodes he'll likely be the one to pick up the pieces [13:24:24] oh, sure, if puppet can wait [13:24:54] * Nemo_bis has this "you broke puppet" cloud over his head [13:26:53] heh [13:27:15] I don't want it to wait days but I'll poke him later on tonight is all [13:27:22] it can wait hours [13:28:24] I just won't be around later to babysit said jobs [13:28:45] speaking of which, need to run an errand [13:29:16] svick: that was a very weird meeting, but I guess 'see you tomorrow' ? :-D [13:30:12] apergos: i assume you mean monday? (that will be out last proper meeting) [13:30:38] yes. Monday [13:30:58] see you then [13:31:35] then we need to figure out how the after gsoc stuff can get don [13:31:38] e [13:32:27] yeah [13:32:51] ok, have a good weekend and happy commenting! [13:33:09] you too, bye [14:38:16] https://en.wikipedia.org/wiki/Wikipedia_talk:Wikipedia_Signpost/2013-09-18/Technology_report <- I'm very pleased with the second section's comment [17:47:50] what kind of tech question i can ask here [20:39:23] I just reviewed the changes included in wmf18, not a whole ton of important ones. I guess all the hacking and socializing didn't realize itself into MW Core overhauls ;) [20:39:52] * greg-g waits for someone to say "but what about my change that...??!!!!" [20:40:11] * YuviPanda breaks the site with a small change while greg-g isn't looking [20:40:23] YuviPanda: I can see you [20:41:34] greg-g: i moved right a bit, you can't anymore [20:41:48] I think my first core patch may be in there somewhere. https://gerrit.wikimedia.org/r/#/c/83010/ [20:42:09] YuviPanda: I can still see your ear [20:42:18] better [20:42:18] greg-g: not anymore! [20:42:31] bd808: ooo, you're touching file! [20:42:38] * YuviPanda considers touching Parser.php [20:43:04] bd808: it is! [20:43:14] * bd808 will be famous! [20:53:07] YuviPanda: wear gloves please [21:22:24] who is the best person to ask some questions about AbuseFilter? legoktm? [21:22:29] hi [21:22:39] depends what your question is :P [21:23:18] jgonera: csteipp or hoo or anomie are good choices ;) [21:23:23] (they're also all not here) [21:23:32] hm [21:23:45] do you know their e-mails? [21:24:14] legoktm, we want to implement some support for AbuseFilter on mobile and I want to know what different error codes mean [21:24:30] I noticed that, unfortunately, every wiki can have its own codes, but they have common prefixes [21:24:30] jgonera: not off-hand, no, try getting them from bugzilla/gerrit? [21:24:48] jgonera: yes, in the AbuseFilter management interface you can set up custom ones [21:24:56] e.g. abusefilter-warning-13 on eswiki or abusefilter-warning-usuwanie-tekstu on plwiki [21:24:58] they're just mediawiki messages that a user might create [21:25:46] so if you look at https://en.wikipedia.org/wiki/Special:AbuseFilter/167 it has a warning set to "abusefilter-warning-afc" [21:25:49] so, my question is, what are the different types of behaviour of AbuseFilter? do all the abusefilter-warning-* will basically just show a warning message and then allow the person to resubmit? [21:25:58] it depends on the filter [21:26:09] some will just be warning, and then the next time you hit edit, it will go through [21:26:18] ok, so how can I determine programatically what to do? [21:26:22] others might be set to warn + disallow which means the edit will be blocked no matter what [21:26:41] but will disallow also appear in the error code? [21:27:01] yeah, if the edit is disallowed there should be a different error message [21:27:07] in other words, can I base what the mobile editor should do on the error code? [21:27:10] hm, ok [21:27:16] * legoktm tests it [21:28:22] how do you test it? just by making a disallowed edit? [21:28:48] yeah [21:28:52] MatmaRex, thanks, I'll try to find their e-mails there [21:29:00] i have a test filter on enwiki which just blocks me from editing one specific page [21:29:08] I see [21:29:11] what page is that? [21:29:24] so if code is 'abusefilter-disallowed' it means the edit was disallowed [21:29:33] https://en.wikipedia.org/wiki/User:Legoktm/EFTest [21:29:44] jgonera: do you want sysop on testwiki so you can play with it? [21:29:52] legoktm, sure [21:30:28] what's your username? [21:31:03] legoktm, let's use JGonera (WMF) [21:31:48] ok done [21:31:49] awww :P [21:31:55] thanks! [21:32:33] jgonera: so i created https://test.wikipedia.org/wiki/Special:AbuseFilter/129 [21:32:43] any time you try and edit, you'll get a warning, and then be disallowed [21:33:52] legoktm, thank you, I'll play with it [21:34:07] np [21:34:40] legoktm, any time I try and edit any page? [21:34:56] yeah, or delete something or move a page [21:35:01] oh, I see the rule, ok [21:35:02] you can disable the filter though :P [21:44:09] legoktm, bonus question: what are abusefilter-blanking, abusefilter-blank, abusefilter-imza, abusefilter-blocked-display and abusefilter-autobiography? those are the only error code we logged in mobile that don't start with abusefilter-warning* or abusefilter-disallow* [21:56:25] jgonera: https://en.wikipedia.org/wiki/MediaWiki:Abusefilter-blocked-display but that feature is only turned on on some wikis like mw.o and metawiki [21:56:57] https://en.wikipedia.org/wiki/MediaWiki:Abusefilter-autobiography was moved in 2009 so i'm not sure [22:05:57] legoktm, I see, they're not that important, they hardly ever happen [22:26:46] [[Tech]]; MF-Warburg; /* Font in the edit window */ new section; https://meta.wikimedia.org/w/index.php?diff=5822605&oldid=5820030&rcid=4559212 [22:30:02] [[Tech]]; Nemo bis; 53734; https://meta.wikimedia.org/w/index.php?diff=5822610&oldid=5822605&rcid=4559219