[Toybox] [PATCH] Fix wc -m on bionic.
enh
enh at google.com
Tue Aug 22 14:05:37 PDT 2017
happy with the replacement patch?
On Tue, Aug 8, 2017 at 3:21 PM, enh <enh at google.com> wrote:
> (sorry, didn't get time to look at this until now.)
>
> On Sun, Aug 6, 2017 at 6:38 PM, Rob Landley <rob at landley.net> wrote:
>> On 08/04/2017 07:54 PM, enh wrote:
>>> When mbrtowc returns -2, all n bytes have been processed. Bionic's
>>> interpretation of POSIX is that you must not re-supply those bytes
>>> on the next call, and should only supply the bytes needed to complete
>>> the character. If you re-supply the bytes on the next call, bionic
>>> considers that an illegal sequence and returns -1.
>>
>> I've had a headache for 3 days and it's really hard for me to make good
>> technical decisions right now, because I want to set fire to everything
>> for being pointlessly overcomplicated.
>>
>> What I really wanted was a function that didn't maintain magic internal
>> state between calls, but would just return if it couldn't do what I
>> asked and let me handle it myself or try again with more data. (When I'm
>> escaping invalid chars that's the semantics I need, and why have two?)
>>
>> I don't know how to make modern libc do the simple thing, it insists on
>> maintaining state between calls even when you pass in a NULL for the
>> state, because reasons. It sounds like if I want a simple converter, I
>> need to write my in lib.c, or maybe wrap this one with a flush after
>> every error. (Sending it a zero byte should flush? I think?)
>
> yeah, this is yet another ugly corner of POSIX. you can get what you
> want, though it's pretty unguessable. the function mbsinit is a
> _predicate_ to tell you whether the mbstate_t is in the initial state
> or not. you're supposed to memset or moral equivalent to reset to the
> initial state. afaik there's no way to reset the global state. i can't
> think i've ever seen one.
>
>> Meanwhile, the code you just sent me is not clearing the magic internal
>> state, and it's user-visible:
>>
>> $ echo -ne '\xf0\xbf\xbf\xbf' | wc -m
>> 1
>> $ echo -ne '\xf0\xbf' > one
>> $ echo -ne '\xbf\xbf' > two
>> $ wc -m one two
>> 0 one
>> 0 two
>> 0 total
>> $ ./toybox wc -m one two
>> 0 one
>> 1 two
>> 1 total
>
> the attached updated patch fixes that and adds the test.
>
>>> With these changes, the tests still pass on glibc and also pass on bionic.
>> I should test musl. After I go lie down again...
>>
>> Rob
>
>
>
> --
> Elliott Hughes - http://who/enh - http://jessies.org/~enh/
> Android native code/tools questions? Mail me/drop by/add me as a reviewer.
--
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.
More information about the Toybox
mailing list