[Toybox] [PATCH v2 1/1] teach head -c

Rob Landley rob at landley.net
Tue Jun 27 13:30:17 PDT 2017



On 06/26/2017 11:58 PM, Ilya Kuzmich wrote:
> On 26/06, Rob Landley wrote:
>> On 06/01/2017 01:36 AM, Ilya Kuzmich wrote:
>>> ping?
>>
>> Let's see...
>>
>>>> Not POSIX, but implemented in coreutils, busybox and freebsd.
>>
>> 1) Do you have a use case for this? Or did you implement this because
>> it's there? Denys added it to Busybox on February 25, 2013 but no
>> message titles in the busybox mailing list around then mention "head" (I
>> checked from February 2013 back to the previous November). I couldn't a
>> bug report in https://bugs.busybox.net/buglist.cgi?quicksearch=head
>> either. So it looks like when busybox was ~15 years old Denys added this
>> because he could, not because anybody requested it or particularly
>> noticed it was missing...
> Convenience and compatibility.
> It's widely used syntax, on github alone `"head -c" language:shell`
> query returns 22,602 code results.
> Personal perspective: my embedded linux $DAYJOB uses head -c alot.

You personally using it is good enough for me. :)

Applied. (And then I checked in a second nitpicking commit about
whitespace and replacing the man page link with a "deviations from
posix" comment instead, mostly because I was staring at it for so long.)

>> 2) On ubuntu "echo hello | head -c 0" produces no output. This one looks
>> like it falls back to line based behavior?
> No, it does not.
> I've just tested my implementation and it produces no output either.

Ok.

>> 3) The ubuntu version has a more complicated -c behavior than you
>> implemented, "head -c -6600 README" currently prints the first 24 bytes
>> of that file. Why did you stop there? Why do we need this part but not
>> all of it? (Our tail already implements the -c +k behavior, but somebody
>> had an existing use case that needed it...)
> It's just that I don't need negative values.

Hmmm. I've added a todo item locally, but looking at it... it's a mess,
isn't it?

> But hey - we could merge head.c and tail.c together.

I'd be all for that if I could figure out how to have a result simpler
than we started with.

Tail is kinda terrible: it has to remember data it's already seen and
then count backwards, and it needs two codepaths to do it remotely
efficiently (because reading through a multi-gigabyte file to display
the last 3 lines is unreasonably slow (sometimes minutes vs fraction of
a second), but you can't seek a pipe so "zcat | tail" has no choice but
to read). Plus you read data in blocks but parse it in lines so the
number of blocks you have to retain isn't fixed but can't start
outputting until you have all the data.

Head can do line at a time partial progress and forget what it's seen.
Until you get to -c with a negative value. where you have to retain all
the data you've seen because "zcat big.gz | head -c -999999999" doesn't
know when the file's over until it hits the end...

Eh, but it only has to buffer up to the negative value. Anything that
overflows from that it can print immediately. So it's not THAT bad, the
value you type is the limit on the memory allocation. (Still an out of
memory error in a can though, "cat /dev/zero | head -c -999999999999"
isn't going to be friendly to _any_ system...)

Rob



More information about the Toybox mailing list