[Toybox] More expand cleanups
Felix Janda
felix.janda at posteo.de
Fri Nov 30 10:56:06 PST 2012
On 11/30/12 at 02:51am, Rob Landley wrote:
> I updated wc to theoretically deal with buffer wraps better. In reality
> I haven't got UTF8 test data to run through this, and should probably
> find some at some point.
Didn't the original version already deal well with it. (That's what the
strange tests are about.) Notice that the C library keeps an internal
state which is updated each time mbrtowc is called. If there's only part
of a character on the end mbrtowc will return -2, but remember the part.
To continue you just have to read a new toybuf and point mbrtowc to the
_new_ data. (The "r" in mbrtowc stands for "restartable".)
Do the tests still work?
> I redid the actual expand function to be simpler: read data into toybuf
> and then write it to stdout using either fputc(char, stdout) or
> xprintf("%*c", len, ' ') depending on whether it's a tab or something
> else. It checks for tab (trigger the space behavior) and newline (reset
> counters).
>
> What it does _not_ currently do is track "spaces advanced" separately
> from "bytes advanced", that needs the utf8 stuff to grab groups of
> bytes that represent a single character, and to make _that_ work I need
> to copy the logic I just added to wc, which means maybe I should
> genericize it into lib/lib.c somehow? Needs more thought.
>
> This also assumes that all characters are the same width, which is
> probably wrong and I need help with if so. (I dunno how to do
> fontmetrics here?)
I think that this depends on the terminal emulator. Look for example at the
"-cjk_width" option of xterm.
Felix
1354301766.0
More information about the Toybox
mailing list