[Toybox] [PATCH] tail fixes

Wed Sep 4 16:46:44 PDT 2013

Replying with gmail, sorry if this winds up a bit garbled...

On 8/31/13, Felix Janda <felix.janda at posteo.de> wrote:
> This should now hopefully fix the earlier segfaults.

Yay!

(Applied on the theory that what's there segfaults.)

> In the case that lseek doesn't work and we count from the end
> the algorithms for tail -n and tail -c are now more separate.
> (tail -c is now more efficient since it adds lenghts instead of
> counting bytes.)
>
>
> POSIX tail BTW does only accept one input file. tail is not specified
> by LSB. I guess it is convenient when combined with "-f" to watch
> multiple files.
>
> For tail -f, how about putting the fds into an array and then using
> select() to watch the fds?

I've been thinking about that. It's a toss up between not wanting to
complicate the code and that being the obvious way to do that. (It
boils down to "is this worth doing".)

Right now netcat has either poll or select logic in it (I forget
which), and there are a couple other things like tee and less that
probably should have it. I have a vague note that if I can work out a
good library function abstracting this away into something simple,
using it in more places costs less. But haven't gone back to try to
work out a design. (Netcat's next big todo item is udp support, then
ipv6.)

> GNU and busybox print the header again when the file last read from
> changes. They behave differently when a file occurs on the command
> line more than once.

Sheesh, I'd missed that last one. (Behave differently how?)

> # HG changeset patch
> # User Felix Janda <felix.janda at posteo.de>
> # Date 1377945041 -7200
> # Node ID 3cd61d16aabec82505da96a142663cedbba8d12a
> # Parent  9a059e9b3a17a372fc8b225f512af57c72f4eeaa
> tail: Some fixes
>
> - Rewrite most of the not lseek() logic
> - Change meaning of len in line_list
> - Use single instead of double linked list
>
> diff -r 9a059e9b3a17 -r 3cd61d16aabe toys/posix/tail.c
> --- a/toys/posix/tail.c	Fri Aug 30 20:09:10 2013 +0200
> +++ b/toys/posix/tail.c	Sat Aug 31 12:30:41 2013 +0200
> @@ -38,19 +38,20 @@
>  )
>
>  struct line_list {
> -  struct line_list *next, *prev;
> +  struct line_list *next;
>    char *data;
> -  int len;
> +  size_t len;
>  };
>
> -static struct line_list *get_chunk(int fd, int len)
> +static struct line_list *get_chunk(int fd, size_t len)
>  {
>    struct line_list *line = xmalloc(sizeof(struct line_list)+len);
>
> -  line->data = ((char *)line) + sizeof(struct line_list);
> +  line->data = (char*)line + sizeof(struct line_list);

Adding a space between char and *. Pet peeve of mine: people doing
"char* a, b;" and then avoiding declaring more than one pointer at a
time after getting bit by that because they blame the wrong thing for
their mistake.

>    line->len = readall(fd, line->data, len);

Which is returning signed, and -1 for error...

> +  line->next = 0;
>
> -  if (line->len < 1) {
> +  if (line->len + 1 < 2) {

And this trick is assuming -1 will wrap to 0 in unsigned, and is
uncomfortably subtle. Which is why len was an int in the first place.

Possibly len should be ssize_t?

>      free(line);
>      return 0;
>    }
> @@ -61,7 +62,9 @@
>  static void dump_chunk(void *ptr)
>  {
>    struct line_list *list = ptr;
> -  xwrite(1, list->data, list->len);
> +  size_t len = list->len - (list->data - (char*)list - sizeof(struct
> line_list));

*blink* *blink*

Ok, data minus the start of the structure minus the size of the
structure... you're moving data forward _without_ adjusting len? And
assuming things about the layout of the structure...

Would it make more sense to add an "int used;" to the structure?

> +
> +  xwrite(1, list->data, len);
>    free(list);
>  }
>
> @@ -71,11 +74,11 @@
>  static int try_lseek(int fd, long bytes, long lines)
>  {
>    struct line_list *list = 0, *temp;
> -  int flag = 0, chunk = sizeof(toybuf);
> -  ssize_t pos = lseek(fd, 0, SEEK_END);
> +  int flag = 0;
> +  size_t chunk = sizeof(toybuf), pos = lseek(fd, 0, SEEK_END);

Are single lines bigger than 2 gigabytes really a huge problem?

>    // If lseek() doesn't work on this stream, return now.
> -  if (pos<0) return 0;
> +  if (pos == -1) return 0;

Negative one is less than zero...

>    // Seek to the right spot, output data from there.
>    if (bytes) {
> @@ -88,40 +91,36 @@
>
>    bytes = pos;
>    while (lines && pos) {
> -    int offset;
> +    size_t offset;
>
>      // Read in next chunk from end of file
> -    if (chunk>pos) chunk = pos;
> +    if (chunk > pos) chunk = pos;
>      pos -= chunk;
>      if (pos != lseek(fd, pos, SEEK_SET)) {
>        perror_msg("seek failed");
>        break;
>      }
>      if (!(temp = get_chunk(fd, chunk))) break;
> -    if (list) list->next = temp;
> +    if (list) temp->next = list;

Ok, _that_ is a largeish behavior change. Was that the bug?

I tend to use double lists as an easy way to assemble them in-order,
and also because I have existing dlist functions to manipulate them.
(I didn't bother to write the single linked functions in the library
because they're about as easy to do inline... and then the lazy way
was just to use the double ones. Six of one...)

I need to read through the rest of this more closely, but this message
has already sat half finished on my desktop for a couple days now...

Rob

 1378338404.0