[Toybox] Looking at nl

Rob Landley rob at landley.net
Sun May 26 18:34:36 PDT 2013


On 05/25/2013 06:36:49 PM, idunham at lavabit.com wrote:
> I thought I'd look at what nl takes, since it's not much more than
> looping over some lines, incrementing, and formatting the output.
> The POSIX reference page is:
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/nl.html

I suspect that page's author did not speak english as their first  
language.

I also suspect nobody's looked at this command in 15 years, here's the  
Open Group Base specifications issue 5 version (SUSv4 is issue 7, SUSv3  
was issue 2, this was 1997.)

   http://pubs.opengroup.org/onlinepubs/7908799/xcu/nl.html

> There are a few little "fun" details, though.

There usually are. :)

> -Fixed length strings.
>  -n should be one of 3 2-char strings, and -d should be at most 2  
> chars.
>  Is there a way to indicate this in NEWTOY or GLOBALS, or should we  
> just
>  check for matches in the main loop?

Just do it in main. There's nothing in lib/args.c to do that now, and I  
don't think it's generic enough to add. (Also, -n is checking for 3  
specific strings, the error is just the 'else' case even if it's length  
2.)

Also, -d "" would be an error case, so "<2" isn't useful there either.  
That's length 1 has behavior, length 2 has behavior, else.

>  I'm half tempted to just ignore length and assume the default if an  
> invalid
>  arg is specified. So -n lna would be treated as -n ln, and -n asd  
> would be
>  treated as -n rn.  But that's probably a little too liberal in
>  accepting bad flags...

Up to you. I suspect that most of the nl complexity is vestigial.  
Haven't encountered anything that uses more than just "number lines",  
but then I wasn't looking...

> -Variable format specifiers:
>  -w 5 means printing roughly this:
>  printf("%5d%s%s", linecount, sep, toybuf)
>  But -n ln -w 7 makes it %-7d...

So all these options are to control the alignment and indentation of  
the line numbers.

> -When to start a new page.
> A page contains a header (delimiter occurs 3x),
> body (delimiter occurs 2x), and footer (delimiter occurs once).
> 
> POSIX specifies that line numbering shall be reset at the start of  
> each
> logical page, and that "Unless otherwise specified, nl shall assume  
> the
> text being read is in a single logical page body."
> The obvious approach to me is to say that if you go to a new page  
> section
> no lower on the page, you've started a new page.
> So
> \:\:
> Text of Page 1
> \:\:
> More text.

Makes sense to me.

> is 2 page body sections, numbered thus:
>      1  Text of Page 1
> 
>      1  More text.
> 
> However, GNU nl assumes that a new page starts with a new header, and
> treats two consecutive body sections as one...except it prints a blank
> line between them as a section separation should. So the sample above
> becomes:
>      1  Text of Page
> 
>      2  More text.

I've hit a number of things gnu gets wrong. :)

> On other topics...
> Would initializing GLOBALS to the defaults be something sane to allow,
> or would it complicate the build system too much?
> Example:
> USE_NL(NEWTOY(wc, "1b:d:f:h:i#l#n:ps:v#w#", TOYBOX_USR | TOYBOX_BIN))
> ..
> GLOBALS(
> char *btype = "t";
> char *delim = "\:";
> char *ftype = "n";
> char *htype = "n";
> long incr = 1;
> long maxblank = 1;
> char *fmt = "rn";
> char *sep = "\t";
> long startnum = 1;
> long width = 6;
> )

GLOBALS is actually a union, which starts zeroed. Initializing it to  
command-specific values would mean it wasn't zeroed for other commands.

I.E. do it in main.

Rob


More information about the Toybox mailing list