[Toybox] Does anyone here understand how unicode combining characters work?

Rob Landley rob at landley.net
Wed Sep 26 11:48:03 PDT 2018


On 09/26/2018 10:28 AM, Rob Landley wrote:
> The crunch_str() logic is designed to escape nonprintable stuff and for watch.c
> I need to write something that measures output but lets utf8 combining stuff
> happen. (And measures tabs. And also parses at least the color change part of
> ansi escapes, but we'll burn that bridge when we come to it...)
> 
> Using hexdump and echo -e's hex escapes to try to print minimal bits of the
> combining character examples (which cut and paste appears to have horked
> somewhat, but you get the idea):
>
>   $ cat tests/files/utf8/test1.txt
>   l̴̗̞̠ȩ̸̩̥ṱ̴͍̻ ̴̲͜ͅt̷͇̗̮h̵̥͉̝e̴̡̺̼ ̸̤̜͜ŗ̴͓͉i̶͉͓͎t̷̞̝̻u̶̻̫̗a̴̺͎̯l̴͍͜ͅ ̵̩̲̱c̷̩̟̖o̴̠͍̻m̸͚̬̘ṃ̷̢͜e̵̗͎̫n̸̨̦̖c̷̰̩͎e̴̱̞̗
>   $ echo -e '\xcc\xb4\xcc\x97\xcc\xa0e'
>   e
>   $ echo -e 'l\xcc\xb4\xcc\x97\xcc\xa0e'
>   l̴̗̠e
>   $ echo -e '\xcc\xb4\xcc\x97\xcc\xa0ee'
>   ee
>   $ echo -e 'l\xcc\xb4\xcc\x97\xcc\xa0'
>   l̴̗̠
>   $ echo -e '\xcc\xb4\xcc\x97\xcc\xa0'
> 
> So there needs to be a character _before_ the combining characters for them to
> take effect, but they apply to the character _after_? Even when it's a newline?
> (Which still works as a newline, but leaves trailing weirdness?)

But if I have just enough characters to fill a line, the trailing weirdness does
_not_ go to the next line (it appears to get discarded), at least on my 80 char
xfce Terminal:

echo -e
'xxxxxxxxxxxxxxxxxx0123456789091234567890123456789012345678901234567890123456789a\xcc\xb4\xcc\x97\xcc\xa0'

I should look up what these escape sequences _do_. Hmmm... I could slowly and
painfully do that by hand, but really I want a sort of unicode version of
"hexdump -C" telling me what the codepoints are. (Ideally combined with a
variant of the "ascii" program to then tell me what each one does.) Somebody has
to have written this already, but I dunno what to Google for. Hmm...

Hey Rich, I'm fiddling with unicode and lost/confused. Know any good tools for this?

Rob



More information about the Toybox mailing list