[Toybox] Does anyone here understand how unicode combining characters work?

Rob Landley rob at landley.net
Wed Sep 26 13:42:25 PDT 2018


On 09/26/2018 03:00 PM, enh wrote:
> if anyone's interested, here's how bionic translates from the actual
> unicode properties to implement wcwidth:
> https://android.googlesource.com/platform/bionic/+/master/libc/bionic/wcwidth.cpp
> 
> (we do this in general so that we can outsource all the actual
> unicodet data to icu4c, and thereby guarantee consistency for
> C/C++/Java regardless of which API is actually called.)

I think I've got the answer to my question now. what I needed to know was how
much I can print before the cursor winds up on the next line (and scrolls the
screen if it was at the bottom), and the answer is "print combining characters
_after_ the last character, but stop before the next wcwidth>0 character that
would overflow the line".

(This is the logic I've needed to work out for screen, less, and vi as well. At
least when they're not doing the force escapes thing.)

The ansi escape parsing is still a todo item, but I note I wrote my own ansi
escape parsing direct screen memory writer for DOS as one of my first C programs
back in 1990. :P

(And tabs. And the other low-ascii stuff that's also handled inconsistently and
which I might have watch and less and such filter out and just not print to the
tty. It'd be nice if TERM=linux specified consistent behavior here, but it's
determined by the terminal display program consuming the output...)

Thanks,

Rob



More information about the Toybox mailing list