[Toybox] Did I mention the release?

Tue Apr 7 07:10:13 PDT 2015

Wow I have a lot of open email composition windows with 'reply to this'
todo items...

(Culling a bit before shutting this computer down while in japan...)

On 03/22/2015 04:09 PM, James McMechan wrote:
> ----------------------------------------
>> Date: Sat, 21 Mar 2015 14:16:00 -0500
>> From: rob at landley.net
>> To: enh at google.com
>> CC: toybox at lists.landley.net
>> Subject: Re: [Toybox] Did I mention the release?
> 
> <SNIP>
> 
>> I'm ok with implementing "pgrep" and "ps" in the same toys/*/filename.c
>> and having the pgrep functionality available to ps. (Would having the
>> filter be interpreted as a regex cause problems, do you think?)
>>
>>> only processes whose COMM or NAME exactly matches are shown. i suspect
>>> that this is an exact match means it's not used all that much, but
>>> it's hard to know without taking it away and seeing if anyone
>>> complains. i might be reduced to that :-(
>>
>> I like figuring out how to fit special cases into a plausibly deniable
>> "I meant to do that" general case. (Sometimes the elegant solution is to
>> back up and solve a bigger problem.)
> 
> And sometimes the better solution is to punt and not solve a messy 
> problem generally but stuff the mess out of the way in the code needing 
> it, and then be able to ship it.

I'm all for shipping stuff, but once it's out there it needs to be
supported. Changing a command's API after the fact is... impolite.

>>> it might be worth my while introducing toybox's pgrep as soon as
>>> possible and encourage people to switch over.
>>
>> Which is also in pending.
>>
>> Lemme merge pgrep into the ps cleanup, and have unrecognized ps
>> arguments... oh, that's nasty. Is "ps ax" unreognized? Hmmm. It's a
>> thing people are going to try to do...
>>
>> (I think I can make this work, I just have to figure out what behavior
>> we _want_. I guess start the option string with "?&" and presumably
>> lib/args.c will handle it? The fiddly bit is that nondash arguments and
>> dash arguments have _different_meanings_ and we don't currently record
>> whether or not we got a dash because things like tar don't care. Hmmm...
>> I don't think one flagset maps cleanly onto the other either, so
>> rejecting _invalid_ flags becomes tricky...)
> 
> It is starting to look like you are building general infrastructure for a 
> single command.

Trying to avoid that.

> The ps command looks like it was a bad compromise
> between the AT&T and BSD camps and now with GNU extensions so three 
> command syntaxes no dash BSD syntax/single dash AT&T 
> syntax/double-dash GNU long options, and default actions changing 
> depending on whether you have a option of each class.

It's fairly horrible, yes.

My current plan is to error out if somebody uses both BSD and POSIX
style options (dash and nondash) in the same command. (It's ok for tar,
where "tar cvfj banana.tar.bz2 -C walrus thingy" is a reasonable thing
to do. But "ps a -a" is just wrong.

Currently lib/args.c can parse the two styles, but doesn't record which
it saw. If I can write two argument type bits into global "struct
toy_context toys" I can have ps test for both being set and error out.
Or I could add another random piece of punctuation to the start of the
string and have lib/args.c do it. (Haven't decided yet.)

> When the man page 
> explicitly says not to use some options because it uses a heuristic to 
> guess your intended meaning of "O", I would not want to follow.

Yeah, I'm entirely with you on that.

The hard part of implementing ps is figuring out what the correct
behavior should _be_. It's obviously a subset of what's there, but a
subset that doesn't include "ps ax" is too small, and subset that
actually treats "ps -aux" as "ps aux" because special case (and has a
man page entry that says it prints a warning, but it doesn't print a
warning)... that's too big.

> I would also offer busybox as an example, it ignores the arguments and 
> does a basic ps.

Yeah, that's too small.

> It has not been upgraded to a more capable option yet,
>  the usage on my box claims -T and -o but happly ignores both AT&T 
> and BSD options (but not -U mcmechan).

Which is just weird.

> This is looking to add a 
> potentially large amount of complexity for gain in only one command 
> there may be a second command somewhere with this kind of special case, 
> but unless it is quickly clear how to do it, I would suggest a punt.
> 
> Just have a "we don't want the common infrastructure processing the flags"
> flag, we would like to do it in ps.c instead.

I already have that: feed NULL or 0 in for the option string in the
NEWTOY() macro, then option parsing is skipped and toys.optargs is NULL.
(For single command builds this allows lib/args.c to drop out entirely.)
Since toys.argv is always the original unmodified arguments, you can
parse them yourselves from that.

But the existing argument parsing already is getting it 90% right, so
I'm trying to figure out if there's a way to close or paper over the
gap. So far "error out if they pass both kinds, then interpret the
FLAG_x macros as one of the two types depending on which mode they
used". Right now it's not indicating which mode they used, and I really
don't want to add a field to struct toy_context for this, but that looks
like all I'm missing.

(If I wanted to interpret both categories of flags simultaneously,
things would get horrible fast. That's why I'm not doing that.)

>>>> I really want to genericize lib/lib.c:human_readable() so it can
>>>> reproduce the variants we need, and factor out the column spacing code.
> 
> Yes getting a reasonable human format output would help a lot of things.

Possibly I'll get to work on it during my 11 hour flight from LAX to
tokyo. I really hope they have an outlet on the plane...

>>>> What would really help here is test cases. (Extract this tarball, cd in
>>>> and ls -h, and the output should look like this textfile...)
>>>
>>> a tarball or function for all tests to use to test with is a great
>>> idea. it would make it easier to have interesting cases available for
>>> all the commands to be tested against. (dangling symlinks, empty and
>>> non-empty directories, non-regular files, ...)
>>
>> Indeed. Right now I've hijacked blkid's filesystem images for a couple
>> other tests that just need a known file, but a tarball with different
>> kinds of stuff in it would be nice.
>>
>> Unfortunately, this opens up the "test as root" can of worms because
>> "device nodes" are an obvious thing to put in there, so possibly _two_
>> tarballs, one "root" and one "not root".
>>
>> (Hmmm, would moving the blkid files into said tarball make sense?
>> Unfortunately, uncompressed they're huge, the f2fs one alone is 128
>> megs. Maybe if I get tar extracting sparse files by default?)
> 
> compressed in a tarball they should be tiny, I made a 129MB ext2 fs with:
> dd if=/dev/zero of=e2fs.img seek=128 bs=1M count=1
> mke2fs e2fs.img
> both the original and the plain tar file compressed to 23K
> if you use the tar -S option it is only 190K without compression
> and then it compresses to 2.2K (using xz)

Yes but when we extract that it takes up 129 megs.

A small tarball that doesn't have cause significant I/O or cache
pressure could be made available to tall tests as a matter of course.
One that's going to run a significant fraction of a gigabyte through the
I/O subsystem would have to be requested and cleaned up after by the
tests that use it.

(And currently the test suite isn't detecting disk full situations, and
I'd like to run it on VMs that may be running out of initramfs with
qemu's default 256 megs of space...)

It's all doable, it's just "where to park the unavoidable suck inherent
in the problem space" is only a question I focus on when "Can I somehow
make this not suck?" doesn't confess even when I tie it to the comfy
chair and poke it wih soft cushions.

(Having our tar implementation able to extract runs of zeros as "sparse"
would fix that. That's why I'm kinda holding off on this until I
implement our tar...)

>> Unfortunately "as good as the desktop" is under-specified. Take Red Hat
>> 6-ish circa 1999: that was a desktop, was that good enough? The current
>> commands are much much bigger and more elaborate, but are they enough
>> better to justify the bloat? (Answer: it depends.)
> 
> Hold  off until you find out what people actually need ;) e.g. the busybox ps
>  command does not do much of anything but it gracefully ignores my fat 
> fingered usage when I type ps aux or ps -ef... actually it looks like it
>  just prints out the long format at all times on my machine.

Indeed.

But posix specifies specific behavior, and I'd like to implement at
least a large subset of that. What busybox is doing here is not good enough.

>> (What might be better in this case was a getline() variant that returned
>> a chunk of mmapped file, start pointer and length. I designed toybox sed
>> to work with start and length so it could handle built-in nulls,
>> although A) I don't think that part's finished, B) it's probably also
>> assuming each segment is null terminated because there's no way to get
>> start/length semantics out of regex() and copying the data into a
> 
> Err  the reference above lead me to a nice regex paper talking about the 
> extensions to Tompson's NFA paper for such uses: unanchored
> http://swtch.com/~rsc/regexp/regexp1.html
> I  thought the plan9 library look pretty easy to read, it seems like it 
> would take length & buffer pointer if not several others should

This really sounds like libc's problem. :)

Rob

 1428415813.0