[Toybox] [PATCH] Fix wc -m on bionic.

enh enh at google.com
Tue Aug 8 15:21:29 PDT 2017


(sorry, didn't get time to look at this until now.)

On Sun, Aug 6, 2017 at 6:38 PM, Rob Landley <rob at landley.net> wrote:
> On 08/04/2017 07:54 PM, enh wrote:
>> When mbrtowc returns -2, all n bytes have been processed. Bionic's
>> interpretation of POSIX is that you must not re-supply those bytes
>> on the next call, and should only supply the bytes needed to complete
>> the character. If you re-supply the bytes on the next call, bionic
>> considers that an illegal sequence and returns -1.
>
> I've had a headache for 3 days and it's really hard for me to make good
> technical decisions right now, because I want to set fire to everything
> for being pointlessly overcomplicated.
>
> What I really wanted was a function that didn't maintain magic internal
> state between calls, but would just return if it couldn't do what I
> asked and let me handle it myself or try again with more data. (When I'm
> escaping invalid chars that's the semantics I need, and why have two?)
>
> I don't know how to make modern libc do the simple thing, it insists on
> maintaining state between calls even when you pass in a NULL for the
> state, because reasons. It sounds like if I want a simple converter, I
> need to write my in lib.c, or maybe wrap this one with a flush after
> every error. (Sending it a zero byte should flush? I think?)

yeah, this is yet another ugly corner of POSIX. you can get what you
want, though it's pretty unguessable. the function mbsinit is a
_predicate_ to tell you whether the mbstate_t is in the initial state
or not. you're supposed to memset or moral equivalent to reset to the
initial state. afaik there's no way to reset the global state. i can't
think i've ever seen one.

> Meanwhile, the code you just sent me is not clearing the magic internal
> state, and it's user-visible:
>
>   $ echo -ne '\xf0\xbf\xbf\xbf' | wc -m
>   1
>   $ echo -ne '\xf0\xbf' > one
>   $ echo -ne '\xbf\xbf' > two
>   $ wc -m one two
>   0 one
>   0 two
>   0 total
>   $ ./toybox wc -m one two
>   0 one
>   1 two
>   1 total

the attached updated patch fixes that and adds the test.

>> With these changes, the tests still pass on glibc and also pass on bionic.
> I should test musl. After I go lie down again...
>
> Rob



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Fix-wc-m-on-bionic.patch
Type: text/x-patch
Size: 4205 bytes
Desc: not available
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20170808/adffa43b/attachment-0003.bin>


More information about the Toybox mailing list