[Toybox] vi 'b' command broken

Rob Landley rob at landley.net
Wed Nov 22 11:09:40 PST 2023


Wow, how long has THIS one been buried behind other windows? (Trying to finally
reboot my laptop so I can upgrade stuff...)

On 10/11/23 11:13, enh wrote:
> On Wed, Oct 11, 2023 at 3:22 AM Rob Landley <rob at landley.net> wrote:
>>
>> On 10/6/23 05:05, Rob Landley wrote:
>> > Apparently the widest unicode characters are:
>> >
>> > 1. ﷽
>> >
>> > 2. 𒐫
>> >
>> > 3. 𒈙
>> >
>> > 4. ⸻
>> >
>> > 5. ꧅
>> >
>> > The first 4 of which xfce's terminal does NOT like. And thunderbird fits the
>> > first one in 3 columns while vim's giving it... 9 I think.
>>
>> And trying to add a file with those to the test suite, neither glibc nor musl is
>> returning wcwidth() for them (it's all 1). And washing the attempt through
>> ltrace it looks like their unicode code points aren't defined in
>> https://www.w3.org/TR/xml-entity-names/1D7.html and friends.
>>
>> Which is odd because the web browser and terminal and so on render them
>> properly. But if neither glibc nor musl can handle them, I can't add "fold"
>> tests for them, can I? (Haven't tried bionic, but possibly this is what Elliott
>> meant when said he used a bigger gui library for this sort of thing...)
> 
> yeah, bionic _does_ implement wcwidth() but admits that it's fairly
> bogus.

I miss java 1.1's fontmetrics with the awt and lightweight canvas where we just
wrote our own widget set and it worked. It was the first graphical toolkit I'd
actually been _comfortable_ with since logo. (And I say that having learned
IBM's System Object Model in order maintain a project implemented as a subclass
of the OS/2 Workplace Shell's "folder" class.)

Pity they added swing (hell no), and then Sun screwed over blackdown so hard I
fled screaming from the entire language...

> if you really care, not even icu4c (my usual answer to such
> questions, and something bionic regularly forwards such questions to),
> you want to talk to something like
> https://en.wikipedia.org/wiki/HarfBuzz instead --- this shit gets
> weird, fast.

Yes, but that's not really the question I'm asking. How often do new unicode
tables come out and do they ever really make big changes? There are only 1.1
million possible values, this is not a big table of numbers in a modern
computing context, and there presumably ARE answers?

[scribble scribble scribble...]

The attached fontmetrics.c prints each character and asks the terminal (in this
case xfce's) how many columns the cursor moved, using the query cursor position
ascii escape sequence. You run it ala "./fontmetrics > out.txt" and then leave
that terminal alone for a while. (Alas with "| tee out.txt" instead of a
redirect the conflicting writes to stdin and stdout occasionally glitch
slightly.) The results are 0 columns, 1 column, 2 columns, and everything else.
It's still running (slow) but so far I've got:

$ for i in 0 1 2 '-v =[012]'; do grep $i'$' out.txt | wc -l; done
1160
10734
15891
2

And those two weirdos are:

$ grep -v '=[012]' out.txt
9=8
89=8

And two of those "else" are tab (which is weird) and enter (which I think
confused it, partly because it was in raw mode so it could read the returned
sequences without waiting for a newline).

I'm not quite sure what's up with 0x89, but:

$ toybox unicode 0x88
U+0088 :  : 0xc2 0x88
landley at driftwood:~/toybox/toybox$ toybox unicode 0x89
U+0089 : 	 : 0xc2 0x89

I mean yeah, I'm seeing it. (High tab?) Haven't poked much yet.

Anyway, why is this NOT a couple bitmaps for 0 and 1 and an if/else staircase
for oddballs, else size 2. I'm aware the xfce terminal isn't exactly cannonical,
and maybe it's printing something when it shouldn't, but this is the question
I'm trying to ask with wcwidth(). When I print this, how many columns does that
consume on the terminal? It's giving a width to these characters.

> (bionic's wcwidth() just passes on the Unicode
> EastAsianWidth property, which isn't _useless_ but it's way too
> simplistic a model to handle stuff like this.)

There are currently 149,813 unicode characters and the largest possible width is
what, 7? So 3 bits each, 56k for a naieve implementation.

The thing that confuses me is this seems like it would HAVE an objective answer...

Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fontmetrics.c
Type: text/x-csrc
Size: 1034 bytes
Desc: not available
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20231122/d90aae8d/attachment.c>


More information about the Toybox mailing list