[Toybox] [PATCH] tail fixes

Thu Sep 12 11:02:29 PDT 2013

Catching up on email while waiting at the airport. (Ohio Linuxfest!)

On 09/06/2013 04:48:10 PM, Felix Janda wrote:
> Rob Landley wrote:
> > What The Hurd implements is not. Those religious fanatics are simply
> > not relevant, except sometimes as a warning of what _not_ to do.
> 
> I just wanted to mention this. It means that we have some more freedom
> in the implementation.

I've since done a checkin that obsoletes most of the analysis here, but  
there were a few other topics...

> > pos was a ssize_t because lseek() can return failure, now you've  
> made
> > it a size_t which can't be negative. (Do you have something against
> > signed numbers?)
> 
> It can't be negative. But it can be -1.

Uh-huh.

> But actually we should use
> an off_t for pos. One might be curious about the end of a 8GB file
> in a 32bit system.

Is ssize_t not sufficient? (That's what the lseek version is using for  
pos. The non-lseek version isn't particularly keeping track of pos  
after grabbing enough characters and lines.)

Also, the argument parsing logic is using "long" for the numbers we  
input. This means a 32 bit system can't accept a -c larger than 2 gigs.  
(Known limitation, didn't want to complicate the code to cope with 32  
bit systems when everything and its dog has 64 bit variants these days,  
and the year 2038 problem is going to flush out the remaining 32-bit  
systems in 25 years. Support them, yes. Make them handle large numbers  
in all cases, not necessarily.)

(The real hiccup with argument parsing is that  
sizeof(long)==sizeof(pointer) so I allocate pointer sized slots at the  
start of GLOBALS() to save arguments into. I can jump through hoops to  
get around that, but the simple approach is to wait for the switch to  
64 bit arm.)

> The previous code had the assumption that every chunk is terminated by
> a newline (which is utterly wrong). This is the simplest  
> implementation
> I could come up. I'll try to explain further things below.
> 
> After spending quite some time with this reply I see the simple
> (less efficient (in terms of number of instructions)) algorithm.

I think I fixed that, but hopefully if there are still problems they'll  
be mentioned in one of the other pending mails I'm reading towards...

> > Actually we don't have an exclusion in the option string for [!nc]  
> and
> > I vaguely recall I was trying to make it work if you specified both  
> at
> > the same time. (So it showed you that many characters from the end
> > _plus_ that many lines before that point.)
> 
> I didn't know that. Could you specify it in more detail (the order  
> matters)?

It was a bad idea, and the attempt at implementing it was broken in  
several places. I made this implementation produce the same results  
other ones do instead.

> > Let's see, assuming the actual bugs were the uninitialized next  
> pointer
> > in the last list entry and the list entries being added in the wrong
> > order (so only the last list entry every showed), so if I separate  
> the
> > count and line codepaths into an if/else but keep the head/tail
> > traversing design without this sideband line array...
> 
> There were more. (Just try the previous lseek logic with something big
> containing no newlines.)

Hopefully the current version works. (I had the test suite do a larger  
test file. Probably didn't hit everything.)

> > You see why I just did a doubly linked list? That way I had one list
> > pointer fed to the function and the function could use it to find  
> the
> > end of the list, but I _didn't_ have different list and list head
> > types. The downside is each node has an extra pointer, for which I  
> get
> > the ability to traverse the list backwards, even if I never use it.)
> 
> It's still a bit confusing to me. list always points to the head and
> list->prev points to the tail?

Yes, it's a circular doubly linked list. Fairly common data structure,  
the kernel uses 'em all over the place. (You have a head pointer, from  
there you can traverse in either direction, and when you hit it again  
you're done.)

This sort of thing is actually why the kernel has magic PID 1 and  
rootfs entries that never go away: the head pointer points to them and  
if that entry is never removed, you never have to move the head pointer  
which makes some nasty races go away (and RCU can handle the rest).

> Could you maybe remark that somewhere?

Fluff out http://landley.net/toybox/code.html#lib_llist some more, you  
mean?

I need to do another documentation pass, I expect...

> So it's nearly the same as the three argument version. (But we get a
> doubly linked list in the end.)
> 
> > On the bright side, if I do keep dlist_pop() I can use it in patch  
> to
> > trim a couple lines there...
> >
> > Eh, I should worry about it in the morning. Bedtime, and still 5000
> > emails behind. (Mostly linux-kernel, but I need to read the
> > Documentation stuff in there. Plus the 3.11 kernel dropped on  
> tuesday
> > and I need to do an Aboriginal release, which means a toybox release
> > _and_ I wanted to get LFS 7.4 built instead of the old 6.8 control
> > image that's years stale at this point...)
> 
> Sorry for keeping you up with this patch.

The point is getting the best code at the end we know how to do. I'm  
often wrong about what's best, either finding local peaks or being  
outright mistaken. Thanks for the kick out of this particular rut...

Rob
 1379008949.0