[Toybox] Does anyone here understand how unicode combining characters work?

enh enh at google.com
Thu Sep 27 13:34:46 PDT 2018


On Thu, Sep 27, 2018 at 7:10 AM Rob Landley <rob at landley.net> wrote:
>
> On 09/27/2018 08:53 AM, Rob Landley wrote:
> > The low-ascii stuff is not related to unicode, yes. But it got swept up in the
> > unicode changes and behavior changed when unicode support went in. And
> > unfortunately, terminal programs differ and the Linux ctrl-alt-f1 text mode
> > terminals differ from the xterms. Haven't tried a frame buffer yet...)
> >
> > For example, when I do echo -e '\x02\x02\x03\x04x' on xfce xterm, I get 4 square
> > boxes with digits in (I.E. uni-codepoint has no glyph, doo dah, doo dah)
> > followed by x. But ctrl-alt-f1 text mode prints nothing and does not advance the
> > cursor either, I just get the x on the first column. (I even tried "export
> > TERM=linux" in both and it didn't change the behavior, that's orthogonal.)
> >
> > Hence filtering some of them out and not printing them if I dunno whether
> > they'll advance the cursor or not.
>
> P.S. I've got this commented out not to self in my local tests/ls.test:
>
> echo -e "$(X=0;while [ $X -lt 255 ];do X=$(($X+1));[ $X -eq 47 ]&&
> continue;printf '\\x%02x' $X; done)"
>
> Which I think was meant to create a torture test for ls -b display mode? Ala
> touch "$(that)" in an empty directory and ls -b it.
>
> That says on this xterm, outputting ascii 0 doesnt' display,

having written several terminal emulators (including the one i still
use every day), if you do show something for NUL you find that a
surprising number of C programs have an off-by-one that causes them to
accidentally output the NUL terminator too.

> 1-4 are boxes, 5 is
> ignored, 6 is a box, 7-f aren't boxes but there's two a couple line breaks in
> there (\b, \t, \r, and \n live in that range, then 0x10 through 1f are boxes again).

http://spinroot.com/pico/pjw.html (search for "Plan 9").

> Meanwhile, in Linux text mode the first non-space character printed is ! and if
> I add an 'x' after the character printed each time it's:
>
> xxxxxxx x
> x
>  x
> x|xxxxxxxxxxxxxxx x!x[and so on]
>
> (Which is confused by \b and \r taking effect, but why is there's a pipe after
> ascii 16???)
> > Going down ratholes most people never noticed the existence of, as usual.
>
> Continuing down said rathole...
>
> (I'm pretty sure "faking the linux VGA text mode behavior for low ascii
> characters" is as close to 'a standard" as we're likely to get here.)
>
> Rob



More information about the Toybox mailing list