[Toybox] More expand cleanups

Fri Nov 30 00:51:24 PST 2012

On 11/28/2012 03:34:59 AM, Jonathan Clairembault wrote:
> > Back to expand_file(). The downside of using readall() is that  
> interactive
> > granularity goes way down. I had this problem with "tee" once upon  
> a time,
> > it meant that piping the output of anything through tee made it  
> appear in 4k
> > chunks, which meant if you logged the result of a build you  
> couldn't really
> > see what the build was doing. I'm not sure expand has the same use  
> cases,
> > but that's why I did xread().
> 
> Well it seems like gnu/damnit version does buffering as well at least
> it does not process input as a line by line basis. I don't see why
> using xread changes anything, you probably need fgets here. Though I
> think we can safely buffer until someone comes in and raises
> interactivity need. wdyt?

I was thinking more along the lines of letting fputc() write data into  
the stdio.h buffer and letting that worry about when to flush it, and  
then we don't have to keep track of two positions.

> > Ah, hang on. Internationalization. This thing is going to need  
> multibyte
> > support for utf8, isn't it? (The same general logic as wc -m. Hmmm,  
> I wonder
> > if they can share code?)
> 
> Ah! I thought toybox was not dealing with internationalization. Though
> that's a good thing to have internationalization.

I'm not doing full internationalization with date formats and having  
sort come up with different orders depending on locale, but UTF8  
support is worth doing (with a top level config symbol, a bit like  
floating point support).

> > Ok, I'll have to come back to this in the morning.

And it is... no longer morning!  (We'll ignore the two missed days in  
there.)

I updated wc to theoretically deal with buffer wraps better. In reality  
I haven't got UTF8 test data to run through this, and should probably  
find some at some point.

I redid the actual expand function to be simpler: read data into toybuf  
and then write it to stdout using either fputc(char, stdout) or  
xprintf("%*c", len, ' ') depending on whether it's a tab or something  
else. It checks for tab (trigger the space behavior) and newline (reset  
counters).

What it does _not_ currently do is track "spaces advanced" separately  
from "bytes advanced", that needs the utf8 stuff to grab groups of  
bytes that represent a single character, and to make _that_ work I need  
to copy the logic I just added to wc, which means maybe I should  
genericize it into lib/lib.c somehow? Needs more thought.

This also assumes that all characters are the same width, which is  
probably wrong and I need help with if so. (I dunno how to do  
fontmetrics here?)

I need to catch up on doing the test suite, because I've been testing  
by hand. My scrollback buffer says:

echo -e 'blah\tblah' | ./toybox expand | hexdump -C
echo -e 'blah\tblah' | ./toybox expand -t 11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah' | \
   ./toybox expand -t 11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
   ./toybox expand -t 11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
   ./toybox expand -t 3,11,11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
   ./toybox expand -t 3,11,22,33 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
   ./toybox expand -t 3,11,22,33,44 | hexdump -C

Possibly I should turn that into an actual automated testy thing.

Sleep time now.

Rob
 1354265484.0