[Toybox] More expand cleanups

Fri Nov 30 10:56:06 PST 2012

On 11/30/12 at 02:51am, Rob Landley wrote:
> I updated wc to theoretically deal with buffer wraps better. In reality  
> I haven't got UTF8 test data to run through this, and should probably  
> find some at some point.

Didn't the original version already deal well with it. (That's what the
strange tests are about.) Notice that the C library keeps an internal
state which is updated each time mbrtowc is called. If there's only part
of a character on the end mbrtowc will return -2, but remember the part.
To continue you just have to read a new toybuf and point mbrtowc to the
_new_ data. (The "r" in mbrtowc stands for "restartable".)

Do the tests still work?

> I redid the actual expand function to be simpler: read data into toybuf  
> and then write it to stdout using either fputc(char, stdout) or  
> xprintf("%*c", len, ' ') depending on whether it's a tab or something  
> else. It checks for tab (trigger the space behavior) and newline (reset  
> counters).
> 
> What it does _not_ currently do is track "spaces advanced" separately  
> from "bytes advanced", that needs the utf8 stuff to grab groups of  
> bytes that represent a single character, and to make _that_ work I need  
> to copy the logic I just added to wc, which means maybe I should  
> genericize it into lib/lib.c somehow? Needs more thought.
>
> This also assumes that all characters are the same width, which is  
> probably wrong and I need help with if so. (I dunno how to do  
> fontmetrics here?)

I think that this depends on the terminal emulator. Look for example at the
"-cjk_width" option of xterm.

Felix

 1354301766.0