[Toybox] [PATCH] POSIX's unexpand command

Rob Landley rob at landley.net
Sat Feb 24 09:01:24 PST 2024


On 2/23/24 20:27, Oliver Webb via Toybox wrote:
> Browsing through list archives from 2020, I found a mention of the unexpand command (in POSIX)
> 
>>From http://lists.landley.net/pipermail/toybox-landley.net/2020-May/019792.html:
> 
> unexpand "converts spaces to tabs". Haven't gotten around to it yet. :)
> 
> This commands behavior is so simple (s/  /\t/g) that it can be knocked out in a couple hours,
> The below patch is the command in 60 lines of code, and some tests for it.
> 
> Since the command only looks for 2 characters (' ' and '\t'), no UTF safety checking is required,
> unexpand doesn't parse backspaces either.

I'm not _that_ far from being able to build Linux From Scratch under mkroot and
then (maybe) populate a debian repository from source, giving me a large pile of
real world sheep to run across this minefield.

And I was basically waiting to see if anything actually tried to use it. (I
personally do sed. In theory unexpand can do math to "advance to next tabstop",
but that never comes up with leading spaces and tabs in the _middle_ of lines
tend to have reasons they're there. I already have utf8 fontmetrics in "fold" in
case the leading debris is unicode, and adding unexpand as a second command to
fold.c to share that plumbing might make sense... except for the whole "what's
the use case" part. What is this command _for_ exactly?)

> The only problem is...
> 
> The GNU man page doesn't say if spaces are supposed to be processed beyond the beginning of lines.
> Since it specifies -a (Spaces -> tabs after the start of lines), and puts the option to disable that
> behavior under "--first-only" (while noting that it OVERRIDES -a). You would THINK it'd not process
> spaces like that, and the "--first-only" option serves the same purpose as grep -G (None at all, but might
> make option parsing a bit simpler for scripts).
> 
> POSIX also doesn't make this apparent while also specifying -a, but in a much more verbose way because POSIX
> 
> --first-only behavior is nice (the only reason I can find that would make this 
> command more useful than sed 's/  /\t/g'), but long options are cumbersome.

I too dislike --longopts with corresponding short options.

> FAIL: unexpand -a behavior default
> echo -ne '  123  123\n' | "/sbin/unexpand" -t 2
> [...]
> -       123      123
> +       123       123
> 
> Now it's converting spaces to tabs, while leaving trailing spaces?

Lovely.

> Also... why does -t enable -a,

Not a clue.

Rob


More information about the Toybox mailing list