[Toybox] [PATCH] Fix wc -m on bionic.

Rob Landley rob at landley.net
Sun Aug 6 18:38:50 PDT 2017


On 08/04/2017 07:54 PM, enh wrote:
> When mbrtowc returns -2, all n bytes have been processed. Bionic's
> interpretation of POSIX is that you must not re-supply those bytes
> on the next call, and should only supply the bytes needed to complete
> the character. If you re-supply the bytes on the next call, bionic
> considers that an illegal sequence and returns -1.

I've had a headache for 3 days and it's really hard for me to make good
technical decisions right now, because I want to set fire to everything
for being pointlessly overcomplicated.

What I really wanted was a function that didn't maintain magic internal
state between calls, but would just return if it couldn't do what I
asked and let me handle it myself or try again with more data. (When I'm
escaping invalid chars that's the semantics I need, and why have two?)

I don't know how to make modern libc do the simple thing, it insists on
maintaining state between calls even when you pass in a NULL for the
state, because reasons. It sounds like if I want a simple converter, I
need to write my in lib.c, or maybe wrap this one with a flush after
every error. (Sending it a zero byte should flush? I think?)

Meanwhile, the code you just sent me is not clearing the magic internal
state, and it's user-visible:

  $ echo -ne '\xf0\xbf\xbf\xbf' | wc -m
  1
  $ echo -ne '\xf0\xbf' > one
  $ echo -ne '\xbf\xbf' > two
  $ wc -m one two
  0 one
  0 two
  0 total
  $ ./toybox wc -m one two
  0 one
  1 two
  1 total

> With these changes, the tests still pass on glibc and also pass on bionic.
I should test musl. After I go lie down again...

Rob



More information about the Toybox mailing list