[09:09:26] Hi, apart from readability, what are the advantages of using [0-9] over \d in a regular expression? The former is self explicit as one knows it's values between 0 to 9. [10:20:18] xSavitar: portability is a concern for some [10:22:12] Nemo_bis: Okay. So we need to check that first before making a change right? The portability aspect? [10:26:19] How does one actually tell if making a change breaks portability concerns, I'm not sure about that tbh [10:28:43] apparently \d can indeed use the libc locale, which is quite alarming [10:28:56] nothing in mediawiki uses the locale on purpose [10:30:12] probably best to replace all usages of \d with [0-9] [10:30:47] with the /u modifier, PHP sets the PCRE2_UCP flag, which means that \d will act like \p{Nd} [10:30:52] which is probably never intended [10:31:00] better to use \p{Nd} explicitly if you want that [10:32:13] I guess that's what Nemo_bis means by portability, he means that the syntax of whatever thing you are parsing may randomly change depending on environment variables in the shell used to start apache [10:35:10] TimStarling, Nemo_bis, thanks a lot for the context! [10:37:18] if would be good to reproduce this, but I can confirm that PHP calls pcre2_maketables(), which enables locale-specific matching [10:37:26] and the PCRE manual says "By default, characters whose code points are greater than 127 never match \d, \s, or \w, and always match \D, \S, and \W, although this may be different for characters in the range 128-255 when locale-specific matching is happening." [10:37:49] http://pcre.org/current/doc/html/pcre2pattern.html [10:38:52] Okay, thanks for the link. Will definitely go through it. Logged this bit of the conversation for future references :) [10:38:53] https://www.mediawiki.org/wiki/User:X-Savitar/Sub-pages/Why_d_to_0-9 [22:40:36] uh wow, I didn't realize \d was based on the locale [22:41:05] https://codesearch.wmflabs.org/core/?q=%5C%5Cd&i=nope&files=&repos= [23:01:12] ewww [23:01:40] Even in non super magic unicode mode? [23:03:34] by super magic unicode mode I mean PCRE_UCP, which i don't even think has an option in php [23:04:20] e.g. I think that /(*UCP)\d/ would match all sorts of weird things [23:04:27] and /\d/ would just match [0-9] [23:05:36] huh, the docs are kind of ambigious on this [23:07:15] oh found https://pcre.org/original/doc/html/pcreapi.html#localesupport [23:07:17] ick [23:07:43] So the answer is probably not, but maybe sometimes depending on compile time options and how its called