[Toybox] utf8 display question.

Rob Landley rob at landley.net
Wed Oct 25 16:42:14 PDT 2017


I'm adding cut -C to do column-based selection, what should it do about
the middle of double width characters? middle of double width
characters? Right now I'm having it round down, so since japanese text
is double width in monospaced fonts:

$ cat tests/files/utf8/japan.txt && echo
私はガラスを食べられます。それは私を傷つけません。
$ ./cut -C 5-11 tests/files/utf8/japan.txt
ガラス

I.E. 5 skips the first 2 (which starts at column 4, the next display
point _below_ 5), and then it continues to stop before the ending
column. (So 5-11 is the same as 5-10, and 5-12 shows 4 characters
because the 4th character includes column 12).

This is consistent, but I'm not sure if it's right...? Should the first
one round up instead? (Since it's an exclusion range, should the start
fail forward and the end fail backwards?)

Dunno...

Rob



More information about the Toybox mailing list