[Toybox] [PATCH] Fix wc -m on bionic.

Rob Landley rob at landley.net
Wed Aug 23 00:20:25 PDT 2017


On 08/22/2017 04:05 PM, enh wrote:
> happy with the replacement patch?

Sorry, $DAYJOB is eating all my energy these days
(http://lists.j-core.org/pipermail/j-core/2017-August/000646.html), and
thunderbird loses open reply windows on a reboot (my netbook battery
will die _while_plugged_in_ if you leave a USB thing drawing power from
a suspended system; it stops charging once the battery fills up and then
the USB drains it again, then it recharges the battery once it's fully
"off")...

Where was I on this one...

> On Tue, Aug 8, 2017 at 3:21 PM, enh <enh at google.com> wrote:
>>> I don't know how to make modern libc do the simple thing, it insists on
>>> maintaining state between calls even when you pass in a NULL for the
>>> state, because reasons. It sounds like if I want a simple converter, I
>>> need to write my in lib.c, or maybe wrap this one with a flush after
>>> every error. (Sending it a zero byte should flush? I think?)
>>
>> yeah, this is yet another ugly corner of POSIX. you can get what you
>> want, though it's pretty unguessable. the function mbsinit is a
>> _predicate_ to tell you whether the mbstate_t is in the initial state
>> or not. you're supposed to memset or moral equivalent to reset to the
>> initial state. afaik there's no way to reset the global state. i can't
>> think i've ever seen one.
Before you sent me the new patch I wrote my own "grab the next utf8 char
without a state structure or caring about environment variables"
function (it wasn't hard and it's something I've been thinking about to
try to speed up "top" anyway; top level can be an inline function with
the "if (*s<128) {*len = 1; return *s}" fast path and then it can call
the real function for the rest), but I'm really uncomfortable replacing
a libc function like this so I wanted to compare mine with musl's
implementation _and_ do utf8 torture tests for all the weird corner
cases, so I needed to do a toys/example/test_utf8.c and appropriate
tests/test_utf8.test, and then you sent an updated patch and I went
"must compare/collate" but didn't get back to it...

>>> Meanwhile, the code you just sent me is not clearing the magic internal
>>> state, and it's user-visible:
>>>
>>>   $ echo -ne '\xf0\xbf\xbf\xbf' | wc -m
>>>   1
>>>   $ echo -ne '\xf0\xbf' > one
>>>   $ echo -ne '\xbf\xbf' > two
>>>   $ wc -m one two
>>>   0 one
>>>   0 two
>>>   0 total
>>>   $ ./toybox wc -m one two
>>>   0 one
>>>   1 two
>>>   1 total
>>
>> the attached updated patch fixes that and adds the test.

On the bright side, I'm mostly caught up on $DAYJOB stuff from when I
was out sick for a few days. :)

Not remotely caught up with toybox yet, and the past few times I poked
at it was trying to get the next round of lsof.c cleanup ready (which is
relaxing because nobody's waiting for it)...

Gimme one more day to try to finish my utf8 replacement code, and if I
haven't I'll merge your patch as-is. Sorry for the delay,

Rob



More information about the Toybox mailing list