[Toybox] More expand cleanups
Rob Landley
rob at landley.net
Fri Nov 30 00:51:24 PST 2012
On 11/28/2012 03:34:59 AM, Jonathan Clairembault wrote:
> > Back to expand_file(). The downside of using readall() is that
> interactive
> > granularity goes way down. I had this problem with "tee" once upon
> a time,
> > it meant that piping the output of anything through tee made it
> appear in 4k
> > chunks, which meant if you logged the result of a build you
> couldn't really
> > see what the build was doing. I'm not sure expand has the same use
> cases,
> > but that's why I did xread().
>
> Well it seems like gnu/damnit version does buffering as well at least
> it does not process input as a line by line basis. I don't see why
> using xread changes anything, you probably need fgets here. Though I
> think we can safely buffer until someone comes in and raises
> interactivity need. wdyt?
I was thinking more along the lines of letting fputc() write data into
the stdio.h buffer and letting that worry about when to flush it, and
then we don't have to keep track of two positions.
> > Ah, hang on. Internationalization. This thing is going to need
> multibyte
> > support for utf8, isn't it? (The same general logic as wc -m. Hmmm,
> I wonder
> > if they can share code?)
>
> Ah! I thought toybox was not dealing with internationalization. Though
> that's a good thing to have internationalization.
I'm not doing full internationalization with date formats and having
sort come up with different orders depending on locale, but UTF8
support is worth doing (with a top level config symbol, a bit like
floating point support).
> > Ok, I'll have to come back to this in the morning.
And it is... no longer morning! (We'll ignore the two missed days in
there.)
I updated wc to theoretically deal with buffer wraps better. In reality
I haven't got UTF8 test data to run through this, and should probably
find some at some point.
I redid the actual expand function to be simpler: read data into toybuf
and then write it to stdout using either fputc(char, stdout) or
xprintf("%*c", len, ' ') depending on whether it's a tab or something
else. It checks for tab (trigger the space behavior) and newline (reset
counters).
What it does _not_ currently do is track "spaces advanced" separately
from "bytes advanced", that needs the utf8 stuff to grab groups of
bytes that represent a single character, and to make _that_ work I need
to copy the logic I just added to wc, which means maybe I should
genericize it into lib/lib.c somehow? Needs more thought.
This also assumes that all characters are the same width, which is
probably wrong and I need help with if so. (I dunno how to do
fontmetrics here?)
I need to catch up on doing the test suite, because I've been testing
by hand. My scrollback buffer says:
echo -e 'blah\tblah' | ./toybox expand | hexdump -C
echo -e 'blah\tblah' | ./toybox expand -t 11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah' | \
./toybox expand -t 11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
./toybox expand -t 11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
./toybox expand -t 3,11,11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
./toybox expand -t 3,11,22,33 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
./toybox expand -t 3,11,22,33,44 | hexdump -C
Possibly I should turn that into an actual automated testy thing.
Sleep time now.
Rob
1354265484.0
More information about the Toybox
mailing list