[Toybox] More expand cleanups
Rob Landley
rob at landley.net
Tue Nov 27 23:51:57 PST 2012
I don't like adding new types to global headers which are only used in
one command, so change global tablist to be a void *... then stop and
think. A linked list of integers is a bit inefficient, shouldn't this
just be an array?
We have toybuf, which we can trivially fill with an array of up to 1024
ints. Is that enough for "expand"? Would 2048 "short" (or 512 long
long) be better? Does having a finite amount of these (which is way
more than anybody's screen size) violate any specification anywhere?
I hardwired in a similar limitation to the number of columns in ls, but
if somebody did have an insane display size combined with a huge
directory it would still display all the contents, just in no more than
a few hundred columns. In that case it was acceptable. Is expand used
for some kind of industrial purpose having nothing to do with human
readable (ballpark 80 column) screens?
If having a maximum tablist size _is_ a problem we can malloc a buffer
of the right size, but right now you physically _can't_ have more than
65k input entries on linux due to process environment limitations (32
times 4k pages of command line text, if each one is a single character
that's 65k), so there's already an implicit limit, we're just making it
smaller but still "sufficiently huge".
So I'm leaning towards filling out toybuf and giving the bad tablist
the error message if it's more than 1024 entries in the list.
So change the global from "tablist" to "tablen".
Also, in the help text I'm tempted to just say "a comma separated list"
instead of having comma or space awkwardly explained in the
paranethetical. Yes we should support space, but since "expand -n 1 2
3" treats 2 and 3 as filenames, you need quotes around it to use spaces
as a separator which is just awkward. If they're going to read the docs
to see how to use it, comma is fine. If an existing script already
provides spaces, we work anyway... Wordsmithing a bit.
Also, the gnu/dammit version accepts more than one -t:
echo -e "one\ttwo\tthree\tfour" | expand -t 10,18 -t 25
So it concatenates those, which means the ascii argument list should be
t*, not t: and the TT entry is an arg_list.
So, convert argument to t*, convert global type to struct arg_list
*tabs, add int tabcount to keep track of how many unsigned toybuf[]
entries parsing filled out, redo build_tablist based on iterating
through TT.tabs, sub-iterating through each string within that, and
incrementing TT.tabcount, and while I'm at it move the -t parsing to
the start of expand_main() because there's not much point having a
separate function that's called from exactly one place, might as well
just do it inline.
Now let's look at expand_file(). Collate all same declaration types, so
just one line of ssize_t declarations instead of several... and why are
we using ssize_t for something that it's a file position offset?
Ah, you're using toybuf already, for file input. Reasonable choice. Ok,
I'll break the parsing logic out into a function that returns the
number of entries and takes the array to write into as its argument,
and have it not write anything to the array if the argument is NULL.
Then I can call it twice, once to count the number of entries and the
second time to fill out the array.
Back to expand_file(). The downside of using readall() is that
interactive granularity goes way down. I had this problem with "tee"
once upon a time, it meant that piping the output of anything through
tee made it appear in 4k chunks, which meant if you logged the result
of a build you couldn't really see what the build was doing. I'm not
sure expand has the same use cases, but that's why I did xread().
Also, I had an argument with dalias (of musl) a while back, who
insisted that read() no longer returns 0 length reads when interrupted
by sigstop. So you can check for return 0 without also checking for
-EINTR now, I think? Anyway, I'll leave readall() for now.
STDOUT_FILENO is 1. Has been since 1969. Just use 1. Although stdout is
a FILE * which means it's got built-in output buffering, so if we use
printf() and friends we don't have to worry about output buffering, and
in fact we can use the printf("%*c", len, char) thing to do the padding
for us with a single codepath... Except that won't work for a trailing
tab at the end of file with no newline after it. Ok, cheat:
printf("%*c", len-1, ' ');
Ah, hang on. Internationalization. This thing is going to need
multibyte support for utf8, isn't it? (The same general logic as wc -m.
Hmmm, I wonder if they can share code?)
Ok, I'll have to come back to this in the morning.
Rob
More information about the Toybox
mailing list