[Toybox] [PATCH] tail fixes

Tue Sep 17 12:44:05 PDT 2013

Rob Landley wrote:
> Catching up on email while waiting at the airport. (Ohio Linuxfest!)
> 
> On 09/06/2013 04:48:10 PM, Felix Janda wrote:
> > Rob Landley wrote:
> > > What The Hurd implements is not. Those religious fanatics are simply
> > > not relevant, except sometimes as a warning of what _not_ to do.
> > 
> > I just wanted to mention this. It means that we have some more freedom
> > in the implementation.
> 
> I've since done a checkin that obsoletes most of the analysis here, but  
> there were a few other topics...

Yeah, it seems so.

> > > pos was a ssize_t because lseek() can return failure, now you've  
> > made
> > > it a size_t which can't be negative. (Do you have something against
> > > signed numbers?)
> > 
> > It can't be negative. But it can be -1.
> 
> Uh-huh.

Just some magic value. (Ok, if sizeof(size_t) < sizeof(int), there is
a problem with this.)

> > But actually we should use
> > an off_t for pos. One might be curious about the end of a 8GB file
> > in a 32bit system.
> 
> Is ssize_t not sufficient? (That's what the lseek version is using for  
> pos. The non-lseek version isn't particularly keeping track of pos  
> after grabbing enough characters and lines.)

In my particular example obviously not. The return value of lseek will
be truncated and we will output stuff from somewhere in the file but
not at its end.

> Also, the argument parsing logic is using "long" for the numbers we  
> input. This means a 32 bit system can't accept a -c larger than 2 gigs.  
> (Known limitation, didn't want to complicate the code to cope with 32  
> bit systems when everything and its dog has 64 bit variants these days,  
> and the year 2038 problem is going to flush out the remaining 32-bit  
> systems in 25 years. Support them, yes. Make them handle large numbers  
> in all cases, not necessarily.)
> 
> (The real hiccup with argument parsing is that  
> sizeof(long)==sizeof(pointer) so I allocate pointer sized slots at the  
> start of GLOBALS() to save arguments into. I can jump through hoops to  
> get around that, but the simple approach is to wait for the switch to  
> 64 bit arm.)

Thanks for mentioning this known limitation. Someone commented on
fallocate on github. It should also use off_t but the argparsing logic can't
handle it anyway. There might be some reasons to create large files with
this toy on 32bit systems.

atolx() seems to have pretty undefined behavior with large numbers?

> > The previous code had the assumption that every chunk is terminated by
> > a newline (which is utterly wrong). This is the simplest  
> > implementation
> > I could come up. I'll try to explain further things below.
> > 
> > After spending quite some time with this reply I see the simple
> > (less efficient (in terms of number of instructions)) algorithm.
> 
> I think I fixed that, but hopefully if there are still problems they'll  
> be mentioned in one of the other pending mails I'm reading towards...

It looks good to me.

> > > Actually we don't have an exclusion in the option string for [!nc]  
> > and
> > > I vaguely recall I was trying to make it work if you specified both  
> > at
> > > the same time. (So it showed you that many characters from the end
> > > _plus_ that many lines before that point.)
> > 
> > I didn't know that. Could you specify it in more detail (the order  
> > matters)?
> 
> It was a bad idea, and the attempt at implementing it was broken in  
> several places. I made this implementation produce the same results  
> other ones do instead.

Ok.

> > > Let's see, assuming the actual bugs were the uninitialized next  
> > pointer
> > > in the last list entry and the list entries being added in the wrong
> > > order (so only the last list entry every showed), so if I separate  
> > the
> > > count and line codepaths into an if/else but keep the head/tail
> > > traversing design without this sideband line array...
> > 
> > There were more. (Just try the previous lseek logic with something big
> > containing no newlines.)
> 
> Hopefully the current version works. (I had the test suite do a larger  
> test file. Probably didn't hit everything.)

Did a few tests. Seems good.

> > > You see why I just did a doubly linked list? That way I had one list
> > > pointer fed to the function and the function could use it to find  
> > the
> > > end of the list, but I _didn't_ have different list and list head
> > > types. The downside is each node has an extra pointer, for which I  
> > get
> > > the ability to traverse the list backwards, even if I never use it.)
> > 
> > It's still a bit confusing to me. list always points to the head and
> > list->prev points to the tail?
> 
> Yes, it's a circular doubly linked list. Fairly common data structure,  
> the kernel uses 'em all over the place. (You have a head pointer, from  
> there you can traverse in either direction, and when you hit it again  
> you're done.)
> 
> This sort of thing is actually why the kernel has magic PID 1 and  
> rootfs entries that never go away: the head pointer points to them and  
> if that entry is never removed, you never have to move the head pointer  
> which makes some nasty races go away (and RCU can handle the rest).
> 
> > Could you maybe remark that somewhere?
> 
> Fluff out http://landley.net/toybox/code.html#lib_llist some more, you  
> mean?
> 
> I need to do another documentation pass, I expect...

How about calling them circular lists instead of double lists?

> > So it's nearly the same as the three argument version. (But we get a
> > doubly linked list in the end.)
> > 
> > > On the bright side, if I do keep dlist_pop() I can use it in patch  
> > to
> > > trim a couple lines there...
> > >
> > > Eh, I should worry about it in the morning. Bedtime, and still 5000
> > > emails behind. (Mostly linux-kernel, but I need to read the
> > > Documentation stuff in there. Plus the 3.11 kernel dropped on  
> > tuesday
> > > and I need to do an Aboriginal release, which means a toybox release
> > > _and_ I wanted to get LFS 7.4 built instead of the old 6.8 control
> > > image that's years stale at this point...)
> > 
> > Sorry for keeping you up with this patch.
> 
> The point is getting the best code at the end we know how to do. I'm  
> often wrong about what's best, either finding local peaks or being  
> outright mistaken. Thanks for the kick out of this particular rut...

It's nice that we now don't have a horribly broken tail anymore.

Felix

 1379447045.0