[Toybox] The cut -C test is failing because bionic's wcwidth() doesn't match glibc or musl.

enh enh at google.com
Tue Oct 19 08:57:01 PDT 2021


are you sure? remember that if a test is currently checked in [and Android
uses the toy; my test runner reports but ignores failures for tests where
readlink says it's not actually toybox], that means that the tests pass on
Android.

there's even a CTS test:
```
TEST(wchar, wcwidth_non_spacing_and_enclosing_marks_and_format) {
  if (!have_dl()) return;

  EXPECT_EQ(0, wcwidth(0x0300)); // Combining grave.
  EXPECT_EQ(0, wcwidth(0x20dd)); // Combining enclosing circle.
  EXPECT_EQ(0, wcwidth(0x00ad)); // Soft hyphen (SHY).
  EXPECT_EQ(0, wcwidth(0x200b)); // Zero width space.
}
```

my guess is that you're using a statically-linked binary? bionic doesn't
have a "static libdl", so when it tries to dlopen() icu4c to handle an i18n
question, that'll fail and in most cases bionic will fall back to "what do
i know about ASCII?" but otherwise report failure. (that's what the first
line of the test is checking too --- "if we're the static version of the
tests, skip this test because this isn't available".)

On Mon, Oct 18, 2021 at 11:39 PM Rob Landley <rob at landley.net> wrote:

> Bionic's wcwidth() returns -1 (error) for combining characters, where
> glibc and
> musl return 0 (does not increase the collective width of the displayed
> characters). This means crunch_str() can't measure the length of the
> output, so
> cut -C behaves like cut -c.
>
> I admit the man page is written a bit confusingly, but combining
> characters are
> technically printable, and therefore should have a length of at least zero.
>
> Rob
>
> P.S. This whole area is funky because the single dumbest thing about
> unicode is
> that combining characters go _after_ the character they combine with,
> meaning
> you can never tell when you've finished parsing a character until you've
> gone
> PAST it and parsed a character that's does NOT attach to this one. Plus
> whenever
> you get short input (typing, serial input, etc) your terminal keeps
> rewriting
> the same character over and over every time it get a new combining
> character
> that changes how the last character should render. If the combining
> characters
> came BEFORE the non-combining character, the non-zero length character
> would
> flush all the pending combining characters and you'd draw the resulting
> glyph
> ONCE. But alas, Microsoft was on the unicode committee.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20211019/2480aeda/attachment-0001.htm>


More information about the Toybox mailing list