[Toybox] [PATCH] Fix wc -m on bionic.
Rob Landley
rob at landley.net
Sun Aug 6 18:38:50 PDT 2017
On 08/04/2017 07:54 PM, enh wrote:
> When mbrtowc returns -2, all n bytes have been processed. Bionic's
> interpretation of POSIX is that you must not re-supply those bytes
> on the next call, and should only supply the bytes needed to complete
> the character. If you re-supply the bytes on the next call, bionic
> considers that an illegal sequence and returns -1.
I've had a headache for 3 days and it's really hard for me to make good
technical decisions right now, because I want to set fire to everything
for being pointlessly overcomplicated.
What I really wanted was a function that didn't maintain magic internal
state between calls, but would just return if it couldn't do what I
asked and let me handle it myself or try again with more data. (When I'm
escaping invalid chars that's the semantics I need, and why have two?)
I don't know how to make modern libc do the simple thing, it insists on
maintaining state between calls even when you pass in a NULL for the
state, because reasons. It sounds like if I want a simple converter, I
need to write my in lib.c, or maybe wrap this one with a flush after
every error. (Sending it a zero byte should flush? I think?)
Meanwhile, the code you just sent me is not clearing the magic internal
state, and it's user-visible:
$ echo -ne '\xf0\xbf\xbf\xbf' | wc -m
1
$ echo -ne '\xf0\xbf' > one
$ echo -ne '\xbf\xbf' > two
$ wc -m one two
0 one
0 two
0 total
$ ./toybox wc -m one two
0 one
1 two
1 total
> With these changes, the tests still pass on glibc and also pass on bionic.
I should test musl. After I go lie down again...
Rob
More information about the Toybox
mailing list