[Toybox] [PATCH] Minor formatting and cleanup in getopt.c

Rob Landley rob at landley.net
Fri Mar 1 18:57:35 PST 2024



On 3/1/24 17:37, Oliver Webb wrote:
> On Thursday, February 29th, 2024 at 19:03, Rob Landley <rob at landley.net> wrote:
>> On 2/28/24 18:23, Oliver Webb via Toybox wrote:
>> 
>> > I've been looking at some of the other pending commands, And found a getopt implementation from 2017 by AOSP.
>> > Looking at the source code, it doesn't seem unclean, nor overly large (about 100 lines).
>> > It does use getopt_long_only, a GNU extension of glibc, but musl has that so it will work
>> > on both glibc and musl. It is GNU compatible too, all test cases pass (even for TEST_HOST).
>> 
>> 
>> It's mostly that:
>> 
>> A) I'm not a shell getopt user and haven't put in the focus to come up to speed
>> on it yet,
>> 
>> B) the shell has a "getopts" (plural) builtin that I need to write, but which
>> this doesn't share code with. And LOTS of stuff confuses the two. (There's a
>> comment in this source about "BSD getopts", when this is "getopt".) Of COURSE
>> the semantics of the two are subtly different.
>> 
>> C) I'd really wanted to get it to use lib/args.c instead of having two getopt()
>> implementations in toybox (one from lib/ and one pulled in from libc).
>> 
>> Sigh. That said, I've been sitting on this for a long time without doing
>> anything about it, and the perfect is the enemy of the good.
>> 
>> The libc version isn't bad, it just bloats the static binary pulling in two
>> implementations of approximately the same code. I don't know if switching it
>> over to lib/args.c is even possible (the semantics are close, but I'd need one
>> to be a strict subset of the other),
> 
> I don't know how closely the lib/args.c code is tied into the infrastructure of toybox,

Eh... "somewhat"?

The best documentation for lib/args.c is the big comment block at the top of the
file, although I tried to do a writeup in code.html as well.

All the code in that file has a single entry point: void get_optflags(void);

Conceptually, it takes five arguments, two input:

1) an option string ala "a:b#cde at f(potato)" describing the options and their
arguments.

2) input argv[]

And three output:

3) a pointer to an argument struct to zero and populate (ala TT).

5) a flag mask indicating which options got enabled.

5) output optargs[] to hold leftover non-option arguments after parsing.

Right now it's taking those from global variables, but I could change it so the
function call takes them as arguments and sticks them in struct getoptflagstate
so they get passed through to the other functions in lib/args.c. Something like:

long long get_optflags(char *optstr, char *argv[], void *tt, char ***optargs);

So the current call in main.c would change to something like
toys.optflags = get_optflags(this.which->options, toys.argv, &TT, &toys.optargs);

toys is a global struct toy_context {} defined in toys.h and instantiated in
main.c, which is where all the global variables common to all commands live
(bundled together into a struct so I don't have random stragglers to track). Its
first member "which" is a pointer to an entry from struct toy_list toy_list[]
which is also defined in toys.h and instantiated in main.c (via a macro
definition and #include "generated/newtoys.h") which has the information from
each command's NEWTOY() or OLDTOY() macro. So the optstr from NEWTOY() winds up
in there, although it's washed through scripts/mkflags.c by scripts/make.sh in a
way I plan to redo to avoid needing a C program and just do it all with sed.
(Ignore that for now, it has to do with config switching portions of the option
string off, such as disabling the security blanket so ls says -z isn't a known
option.)

"this" is the union defined in generated/globals.h with all the appropriate
"struct command_data command;" members from each file's GLOBALS() block, and the
#define FOR_command stuff triggers an #ifdef block out of generated/flags.h that
among other things will #define TT this.command .

Internally, get_optflags() calls parse_optflaglist() to initialize an instance
of struct getoptflagstate, especially the struct opts *opts; member that's a
linked list of known options it parsed out fo the option definition string. Then
it loops through the argv[] arguments recognizing stuff, moving everything that
ISN'T an argument to optargs[], and calls gotflag() to handle each one it
recognizes and set the appropriate state.)

The main semantic difference between how this works and how getopt works is my
code doesn't record what order it saw the flags in (other than things like them
switching each other on and off). When it's done, it presents a bitfield that
says "These flags wound up enabled" and a struct with all the collected data
from the arguments.

Whereas "getopt abc -c -a" and "getopt abc -a -c" produce different output
because they preserve order.

Now, I can hack it, because optargs[] also preserves order, so in theory I can
loop through argv and compare each entry to see if it's the next entry in
optargs[] and if not print it on the left, and then print the "--" and optargs
on the right. But the reason that doesn't work is options have arguments, and
those arguments can be kinda magic:

  $ getopt ab::c one -cbanana two -a three -b four -c
  -c -b anana -a -b  -c -- one two three four

My lib/args.c is handling that fine, but the naieve loop to produce re-sorted
output would just separate "option argument" from "non-option argument" without
handling the arguments to the options.

All that's off the top of my head, I might be forgetting important details...

(And then properly SUPPORTING the extra lib/args.c functionality within a
command line utility, where # says numeric argument with enforced range, and @
is an occurrence counter and * collects a list when the argument is repeated...
There's design work there.)

> or if you can use it like how getopt should work by calling it directly with a struct pointer
> (From looking at lib/args.c, it seems like it? Although there is the GLOBALS stuff which can't
> easily be separated from the infrastructure)

That's not the hard part. :)

> The difference between lib/args.c option strings and getopt() option strings is like the difference
> between ERE's and PCRE's, yes they share a common base, but the differences start to become very
> apparent once you move from that base.

A bit, yes.

> Evaluating this it seems like we could either:
> 
> 1) Change lib/args.c to be a superset of getopt(3), and pry it from the GLOBALS so we can use it on it's own.
> Which is a large amount of effort and will only be used by 2 commands.

Breaking it out isn't that hard, and might make nofork commands in sh.c a little
cleaner. Plus there are various of wrapper commands like "git" that want to
provide a bunch of sub-commands with different arguments. (Right now there are
two multiplexer commands: "toybox" and "sh". Adding git would make a third, and
currently things like TOYFLAG_NOFORK are binary selecting one or the other of
exactly two current options...)

> 2) Have a -t option or something similar to it to parse toybox option strings, which still requires
> prying lib/args.c off the infrastructure, and likely won't be used by anything. And we still will have
> to pull in getopt() from libc.
> 
> 3) Continue using getopt()
> 
>> The existence of the -T option is a damning indictment of the entire gnu
>> project. As if we needed more of those. The -s option isn't doing them any
>> favors either: PICK ONE. But then again, this is gnu we're talking about:
> 
> getopt -s should probably be like cpio -H, ignored and set to the one thing everyone uses,
> and kept in the option string so that we don't error and autoconf doesn't whine.
> ideally without using eval which is a horrible nightmare for security.

They could have output one string per line and turned actual embedded newlines
into \n or something. But no, they did not. And now we're stuck being compatible
with gnu/stupid.

> the 4 options GNU lets you pick from are sh (Bourne Shell?) bash csh and tcsh.
> Bash is a superset of sh, and {t,}csh isn't used by anything these days from my knowledge.

I know some people who use csh. I don't know WHY, and none of them are under 50,
but it's (apparently) not dead.

Doesn't mean I have to support it. :)

Rob


More information about the Toybox mailing list