[Toybox] Has anybody ever actually used cut -f?

enh enh at google.com
Thu Sep 1 16:27:38 PDT 2016


On Thu, Sep 1, 2016 at 2:23 PM, Rob Landley <rob at landley.net> wrote:
>
> On 09/01/2016 03:29 PM, Samuel Holland wrote:
> > Hello,
> >
> > On 09/01/2016 02:58 PM, Rob Landley wrote:
> >> In theory:
> >>
> >> echo "one two three four five" | cut -f 2-4
> >>
> >> Should be really useful, and mean you don't need awk. In practice,
> >> posix specifies that the default separator of cut -f is TAB, and that
> >> the -d delimiter specifier has no way to specify 'arbitrary run of
> >> whitespace'.
> >
> > Yes, I use cut all the time. In fact, I have never intentionally used
> > awk on my own--only when copied from somebody else's one-liners.
> > Usually if there's a variable run of space I cut on the punctuation next
> > to it, or failing that, pipe through `sed 's/\s\+/\t/g'`. Of course,
> > this probably defeats the whole advantage of using cut over awk
> > (simplicity), but it's habit at this point.
>
> Uh-huh.
>
> >> So I propose 2 changes to toybox cut:
> >>
> >> 1) -d "" means arbitrary run of whitespace.
> >>
> >> 2) It's the default.
> >
> > I'm sure people besides me use `cut -f`, but I also assume they use -d.
>
> I checked cut.test right after I sent that message, and every -f test
> also supplies -d.
>
> > So changing the default delimiter to arbitrary whitespace shouldn't be a
> > problem...
>
> Modulo the existing cut matching single characters, and this matching
> _runs_ of characters.
>
> But you've gotta throw that out a bit to support UTF8, so...
>
> > I tried to search GitHub, but they broke global code search;
>
> As Google Code did before them.

searching internally, i see a surprising (to me) amount of `cut -d " " -fN`.

but, yeah, i'd use cut -f a lot more if there was a way to say
"arbitrary sequence of whitespace".

> > Google got me to https://github.com/stephenturner/oneliners and
> > https://gist.github.com/j3tm0t0/4122817 which apparently don't use -d.
> > On the other hand, I see a lot of instances of -d " " which would be
> > simplified by the proposed change.
>
> Yes and no.
>
> echo "one  two   three" | cut -d " " -f 2,3
>
> The answer is " two" with a space before it, and 3 not showing up at all
> because it would be between spaces.
>
> I'm not saying that behavior's more _useful_ (it isn't), I'm just saying
> it's different from -d defaulting to a run of whitespace. Still, -d " "
> would still do the same thing if explicitly supplied, so that's not a
> behavior change.
>
> >> As has been noted before, this makes about 90% of the uses of awk go
> >> away. The downside is, if you're _not_ using toybox cut, it won't
> >> work.
> >>
> >> Any opinions?
> >
> > If you want to avoid breaking existing code, but make cut more useful,
> > accept multiple characters for -d and match any of them.
>
> Needing to supply -d run-of-whitespace every time using double quotes
> (not single quotes) puts it up about with awk in terms of awkwardness to
> use (which requires single quotes, not double quotes). And awk was there
> first.
>
> > Then at least
> > you could do cut -d "$IFS" or similar if you don't know if the output is
> > spaces or tabs.
>
> Or I could have multichar delimiters be -d "abc" meaning
> "armadilloabcbroccoliconfetti" could be split into broccoli, armadillo,
> and confetti.
>
> > This got me thinking, since \n is in $IFS...
> >
> > $ printf "1234\n5678\n\n90\n" | cut -s -f2 -d$'\n'
> > 5678
>
> Cut is defined (by posix) as reading lines,  which are delimited with
> \n, and presumably that happens before it looks for other delimiters
> _within_ the line. Sowhat you're trying to do is facially nuts and I'd
> want to see real code depending on it before looking further.
>
> Presumably it's doing the same "read a large block of text, match, and
> then bck up and find line boundaries" trick their grep is?
>
> > $ printf "1234\n5678\n\n90\n" | cut -f2 -d$'\n'
> > 5678
> > $ printf "1234\n5678\n\n90\n" | ./toybox cut -s -f2 -d$'\n'
> > $ printf "1234\n5678\n\n90\n" | ./toybox cut -f2 -d$'\n'
> > 1234
> > 5678
>
> I never promised bug-for-bug compatibility. That one is not "reading
> lines". Mine is.
>
> > 90
> > $
> >
> > I'm not sure what to make of that.
>
> You asked for something crazy, and they did something crazy. Whether the
> two match up is a matter of opinion.
>
> >> Rob
> >
> > Samuel
>
> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net




-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.



More information about the Toybox mailing list