[Toybox] Has anybody ever actually used cut -f?
scsijon
scsijon at lamiaworks.com.au
Thu Sep 1 16:57:26 PDT 2016
>
> Message: 2
> Date: Thu, 1 Sep 2016 15:29:28 -0500
> From: Samuel Holland <samuel at sholland.org>
> To: toybox at lists.landley.net, Rob Landley <rob at landley.net>
> Subject: Re: [Toybox] Has anybody ever actually used cut -f?
> Message-ID: <9926c62a-75da-0f1c-885e-323cb33c3b8f at sholland.org>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Hello,
>
> On 09/01/2016 02:58 PM, Rob Landley wrote:
>> In theory:
>>
>> echo "one two three four five" | cut -f 2-4
>>
>> Should be really useful, and mean you don't need awk. In practice,
>> posix specifies that the default separator of cut -f is TAB, and that
>> the -d delimiter specifier has no way to specify 'arbitrary run of
>> whitespace'.
>
> Yes, I use cut all the time. In fact, I have never intentionally used
> awk on my own--only when copied from somebody else's one-liners.
> Usually if there's a variable run of space I cut on the punctuation next
> to it, or failing that, pipe through `sed 's/\s\+/\t/g'`. Of course,
> this probably defeats the whole advantage of using cut over awk
> (simplicity), but it's habit at this point.
Totally agree, it took me quite some time to work out how to use cut
effectively, (we use it proficently in Puppy Linux,) but the things you
can do with it both by itself as well as with other tools makes it a
great scripting command. And it seems to 'sort out' any nesting problems
by itself, even when you really stuff up (like after a 4am session).
>
>> So I propose 2 changes to toybox cut:
>>
>> 1) -d "" means arbitrary run of whitespace.
>>
>> 2) It's the default.
>
> I'm sure people besides me use `cut -f`, but I also assume they use -d.
> So changing the default delimiter to arbitrary whitespace shouldn't be a
> problem...
>
> I tried to search GitHub, but they broke global code search; Google
> got me to https://github.com/stephenturner/oneliners and
> https://gist.github.com/j3tm0t0/4122817 which apparently don't use -d.
> On the other hand, I see a lot of instances of -d " " which would be
> simplified by the proposed change.
>
>> As has been noted before, this makes about 90% of the uses of awk go
>> away. The downside is, if you're _not_ using toybox cut, it won't
>> work.
>>
>> Any opinions?
>
> If you want to avoid breaking existing code, but make cut more useful,
> accept multiple characters for -d and match any of them. Then at least
> you could do cut -d "$IFS" or similar if you don't know if the output is
> spaces or tabs.
>
> This got me thinking, since \n is in $IFS...
>
> $ printf "1234\n5678\n\n90\n" | cut -s -f2 -d$'\n'
> 5678
> $ printf "1234\n5678\n\n90\n" | cut -f2 -d$'\n'
> 5678
> $ printf "1234\n5678\n\n90\n" | ./toybox cut -s -f2 -d$'\n'
> $ printf "1234\n5678\n\n90\n" | ./toybox cut -f2 -d$'\n'
> 1234
> 5678
>
> 90
> $
>
> I'm not sure what to make of that.
>
>> Rob
>
> Samuel
>
scsijon
More information about the Toybox
mailing list