[Toybox] The cut -C test is failing because bionic's wcwidth() doesn't match glibc or musl.

Rich Felker dalias at libc.org
Wed Oct 20 11:08:50 PDT 2021


On Wed, Oct 20, 2021 at 01:56:21AM -0500, Rob Landley wrote:
> Those tables are just under 2k each, and could SERIOUSLY be compressed. For
> example, the first 512 bytes of the first table are all "16" except for a sparse
> set of sequential values running from 18 to 59, so it could be initialized from
> a bitmask. Let's see, quick and dirty bash to generate said bitmask from the table:

If you know a compression sceheme that admits accessing them
efficiently in-place, that would be great. Expanding them into memory
is a big net loss. You'd be trading shareable rodata for per-process
dirty memory plus decompression code (text) that's comparable in size
if not larger than the uncompressed data.

ALWAYS prefer static const [] tables over runtime generation except
possibly in microcontroller context where the usual ROM vs RAM cost
analysis may not apply.

> Not sure if I _should_, but I _can_. (It was nice to leave this to libc. Then it
> wasn't my problem to update it every time Microsoft wrote another check to the
> unicode committee. Both glibc and musl can do this when statically linked. Sigh.)

I think it's better not to duplicate this information in more places
that become inconsistent if you don't have to. In theory glibc users
might even have locales where the ambiguous-width characters are
treated as wide -- I'm not sure if anyone does this anymore but it was
legacy CJK practice in some locales and honoring that is the polite
thing to do.

> P.S. Rich's other table has some 17s mixed in the 16s which... I think it moves
> in runs of 8? Very small bitmap if so? It would be so much easier to work out
> the alignment if he'd wordwrapped his tables to a consistent number of entries
> per line, but no. Eh, runs of 4, 54 bits total. Plus two isolated weirdos.) And

The tables are generated and the generator aims to format the output
such that diffs are small and readable when the output is checked in.
It wraps both at column overflow and at fixed power of two indices.

If you'd like to see it, the code (horribly ugly; this does not matter
because it's not something to be deployed anywhere) is at:
https://github.com/richfelker/musl-chartable-tools

> most of the nonzero values in the latter part are 255, so traverse a bitmap of
> THOSE and there's not much left to initialize afterwards. Yeah, a dozen lines
> per table of initialization is looking doable, within range of sticking it in
> portability.c...

I'm pretty sure you'll find this larger than the tables.

Rich



More information about the Toybox mailing list