02/12/19, 12:57 AM | #1 |
|
[open] Character handling flaw
I haven't figured out what is special about it yet, but the "Latin small letter A with grave" (this guy: à) works fine in all text fields:
Lua Code:
However, the second half of it gets treated as a space character which results in the code below splitting the string: Lua Code:
The lua website's demo, however, does not have this problem: Lua Code:
I'm not sure how customized your lua implementation is, but do you fix issues like these? Last edited by Dolby : 02/14/19 at 05:40 PM. |
02/12/19, 06:04 AM | #2 |
This is not a bug, but simply an encoding issue. The Lua string functions assume your input sequence is ASCII, but you used UTF8 for your .lua file.
This means the à character in your test corresponds to the two byte sequence "c3 a0" instead of "e0". According to https://www.ascii-code.com/ "c3" is "Latin capital letter A with tilde" and "a0" "Non-breaking space". The game font cannot properly render the first one since it uses utf8 instead of ASCII, so it shows a box instead and the space is handled by gmatch. Try to convert your .lua file to ASCII and it should work as expected although it will break any "real" UTF8 strings you use and the letter will be rendered as a box unless you use a custom font. Last edited by sirinsidiator : 02/12/19 at 06:08 AM. |
|
02/13/19, 03:08 AM | #3 |
Welcome to the hell of localization.
à is at least part of the extended ascii code. But think about russian or japanese players http://lua-users.org/wiki/LuaUnicode |
|
02/14/19, 06:56 AM | #4 | |||
|
If this is not a bug, it's a terrible feature.
TL;DR: When you have all strings in the game in UTF-8, your string handling functions should not operate in LATIN-1.
Enter ESOLua, modified interpreter. Despite the fact that strings in the ESO API are, for obvious reasons, UTF-8 encoded, string matching functions treat strings as LATIN-1 encoded. Therefore, string.find("\195\160", "%s") returns 2, matching the trailing byte of this two-byte character (in LATIN-1, 160 is a space character). This is BOLLOCKS.
And that's the problem. It's a space only for gmatch assuming wrong encoding, for everyone else it's the second byte of "à". I find this advice confusing. Converting Lua source to ASCII means replacing all non-ASCII characters with "\123" escapes (UTF-8-encoded, of course). Which would be tedious and wouldn't solve the OP's issue. Because "\195\160" == "à", the matching function will see the same bytes as before. |
|||
02/14/19, 09:32 AM | #5 | |
I admit I may not have been completely correct about everything I wrote, but the point still stands that it is not a bug, but just wrong assumptions being made.
Since the pattern classes do not support unicode, one would need to use the appropriate replacements in order to get the expected output: Lua Code:
Last edited by Dolby : 02/14/19 at 05:37 PM. |
||
02/15/19, 07:34 AM | #6 | |||
|
|
|||
02/15/19, 10:39 AM | #7 |
You changed my mind. They only added UTF-8 support after Chip became our new overlord, so it's likely a remnant from before that time. Would certainly be nice if they could make it so everything works consistently.
|
|
02/18/19, 10:01 AM | #8 |
Tell me if this is correct. You are requesting that we replace the lua string.find with our own UTF-8 compatible pattern matching?
|
|
02/19/19, 03:13 PM | #9 | |
Code:
static int match_class (int c, int cl) { int res; switch (tolower(cl)) { case 'a' : res = isalpha(c); break; case 'c' : res = iscntrl(c); break; case 'd' : res = isdigit(c); break; case 'l' : res = islower(c); break; case 'p' : res = ispunct(c); break; case 's' : res = isspace(c); break; case 'u' : res = isupper(c); break; case 'w' : res = isalnum(c); break; case 'x' : res = isxdigit(c); break; case 'z' : res = (c == 0); break; default: return (cl == c); } return (islower(cl) ? res : !res); } |
||
ESOUI » Developer Discussions » Bug Reports » [open] Character handling flaw |
«
Previous Thread
|
Next Thread
»
|
Display Modes |
Linear Mode |
Switch to Hybrid Mode |
Switch to Threaded Mode |
|
|