[Toybox] Did I mention the release?

Sun Mar 22 14:09:03 PDT 2015

----------------------------------------
> Date: Sat, 21 Mar 2015 14:16:00 -0500
> From: rob at landley.net
> To: enh at google.com
> CC: toybox at lists.landley.net
> Subject: Re: [Toybox] Did I mention the release?

<SNIP>

> I'm ok with implementing "pgrep" and "ps" in the same toys/*/filename.c
> and having the pgrep functionality available to ps. (Would having the
> filter be interpreted as a regex cause problems, do you think?)
>
>> only processes whose COMM or NAME exactly matches are shown. i suspect
>> that this is an exact match means it's not used all that much, but
>> it's hard to know without taking it away and seeing if anyone
>> complains. i might be reduced to that :-(
>
> I like figuring out how to fit special cases into a plausibly deniable
> "I meant to do that" general case. (Sometimes the elegant solution is to
> back up and solve a bigger problem.)

And sometimes the better solution is to punt and not solve a messy 
problem generally but stuff the mess out of the way in the code needing 
it, and then be able to ship it.

>> it might be worth my while introducing toybox's pgrep as soon as
>> possible and encourage people to switch over.
>
> Which is also in pending.
>
> Lemme merge pgrep into the ps cleanup, and have unrecognized ps
> arguments... oh, that's nasty. Is "ps ax" unreognized? Hmmm. It's a
> thing people are going to try to do...
>
> (I think I can make this work, I just have to figure out what behavior
> we _want_. I guess start the option string with "?&" and presumably
> lib/args.c will handle it? The fiddly bit is that nondash arguments and
> dash arguments have _different_meanings_ and we don't currently record
> whether or not we got a dash because things like tar don't care. Hmmm...
> I don't think one flagset maps cleanly onto the other either, so
> rejecting _invalid_ flags becomes tricky...)

It
 is starting to look like you are building general infrastructure for a 
single command. The ps command looks like it was a bad compromise 
between the AT&T and BSD camps and now with GNU extensions so three 
command syntaxes no dash BSD syntax/single dash AT&T 
syntax/double-dash GNU long options, and default actions changing 
depending on whether you have a option of each class. When the man page 
explicitly says not to use some options because it uses a heuristic to 
guess your intended meaning of "O", I would not want to follow.

I
 would also offer busybox as an example, it ignores the arguments and 
does a basic ps.  It has not been upgraded to a more capable option yet,
 the usage on my box claims -T and -o but happly ignores both AT&T 
and BSD options (but not -U mcmechan).

This is looking to add a 
potentially large amount of complexity for gain in only one command 
there may be a second command somewhere with this kind of special case, 
but unless it is quickly clear how to do it, I would suggest a punt.

Just have a "we don't want the common infrastructure processing the flags" flag, we would like to do it in ps.c instead.

>>> I really want to genericize lib/lib.c:human_readable() so it can
>>> reproduce the variants we need, and factor out the column spacing code.

Yes getting a reasonable human format output would help a lot of things.

>>> What would really help here is test cases. (Extract this tarball, cd in
>>> and ls -h, and the output should look like this textfile...)
>>
>> a tarball or function for all tests to use to test with is a great
>> idea. it would make it easier to have interesting cases available for
>> all the commands to be tested against. (dangling symlinks, empty and
>> non-empty directories, non-regular files, ...)
>
> Indeed. Right now I've hijacked blkid's filesystem images for a couple
> other tests that just need a known file, but a tarball with different
> kinds of stuff in it would be nice.
>
> Unfortunately, this opens up the "test as root" can of worms because
> "device nodes" are an obvious thing to put in there, so possibly _two_
> tarballs, one "root" and one "not root".
>
> (Hmmm, would moving the blkid files into said tarball make sense?
> Unfortunately, uncompressed they're huge, the f2fs one alone is 128
> megs. Maybe if I get tar extracting sparse files by default?)

compressed in a tarball they should be tiny, I made a 129MB ext2 fs with:
dd if=/dev/zero of=e2fs.img seek=128 bs=1M count=1
mke2fs e2fs.img
both the original and the plain tar file compressed to 23K
if you use the tar -S option it is only 190K without compression
and then it compresses to 2.2K (using xz)

> In case it isn't obvious, the pending directory isn't the only one I
> need to do a cleanup/extend/rewrite pass over. The test suite could use
> several months of fulltime poking.
>
>>>> toolbox/top.c
>>>> output, options, and behavior all differ enough that any switchover
>>>> is going to be interesting. punt for now, worry about these later.
>>>
>>> I would very much like to unify top, htop, and iotop (but haven't looked
>>> into how the third is implemented). Also there's the possibility of
>>> common code with vmstat and ps.
>>>
>>> The other thing is that "top" does terminal control similar to "less".
>>> And "watch" should do the same, and ps does a _little_ of the same.
>>> (They query the terminal size and truncate output at screen width and
>>> screen height.) So my plan was to genericize _that_ code, in a way that
>>> works with shell command line history editing and vi.
>>>
>>> None of which is directly relevant to you, you just want it to work. But
>>> this is why promoting it is harder than it looks...
>>
>> for me "as good as the desktop" is a long-term goal that can take
>> several releases.
>
> Unfortunately "as good as the desktop" is under-specified. Take Red Hat
> 6-ish circa 1999: that was a desktop, was that good enough? The current
> commands are much much bigger and more elaborate, but are they enough
> better to justify the bloat? (Answer: it depends.)

Hold
 off until you find out what people actually need ;) e.g. the busybox ps
 command does not do much of anything but it gracefully ignores my fat 
fingered usage when I type ps aux or ps -ef... actually it looks like it
 just prints out the long format at all times on my machine.

> I intend to keep improving things after 1.0, but "crazy things gnu did"
> is not an interesting thing to chase. (I also consider interface churn
> to be a cost, just like complexity. Which means some things aren't worth
> changing even if the result would be an improvement because we break
> people's scripts, and the scripts in their heads. For example, I type
> "ps ax" on busybox systems all the time, it doesn't parse that but it
> doesn't complain either.)
>
>> "better than toolbox" is enough for right now. it
>> pains me to waste time on things like WCHAN for toolbox ps when i
>> could have been doing that for toybox ps instead, but until we're
>> switched over...
>
> Working on it. :)
>
>>>> toolbox/renice.c
>>>> ours has nonstandard -r, -t (equivalent to -n?), -g (“get”); do we need these?
>>>
>>> Android is as least a much a standard as gnu. If you've got existing
>>> users who will miss the options, patches welcome.
>>
>> things like renice are tricky because they're not heavily used. i need
>> to work out first whether the extra stuff is useful or just random.
>> (one particular problem is that people have added things that weren't
>> quite what they really wanted because it was easier to do that and
>> good enough for what they needed to do, than it was to implement the
>> full standard option they'd have used if it had been available.)
>
> If you do work it out, please document the hell out of it. All toybox
> commands have complete --help as condition of promotion, I've spent more
> time on help text than on command implementation more than once.

In command help/usage is great, thank you very much everybody.

<SNIP>

>>> No rush. The "regex|regex" thing with multiple grep -e is something I
>>> need to figure out how to deal with anyway. (The problem is | only means
>>> something in extended regexes. I had a patch to musl to allow \| to work
>>> in non-extended regexes, but then they rewrote their regex plumbing. I
>>> may have to break down and do multiple passes over the data, or maybe
>>> extend and factor out the ghostwheel() function in sed.c.)
>>
>> (BSD just tries them all in order, separately.)
>
> And if they overlap?
>
> Somebody linked me to a post about the old grep being fast by ignoring
> lines breaks and operating on large data blocks instead (ideally
> mmap)... Ah:
>
> https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

Rather cool, especially the sam editor stuff about reversing the DFA for reverse searches
though it lead me off into the weeds ala xkcd/214 above

> And my reply after reading that was basically "I'm not doing that".
> Locality of reference and avoiding multiple passes over the data still
> seems like a good idea, but iterating over readline() is a common
> pattern. (What I worry about there is "readline returns a gigabyte".
> There's no length limit on the line, and no obvious way of _adding_ a
> length limit that doesn't boil down to not properly handling long lines...)
>
> *shrug* I did "fast" and "slow" modes for tail (mmap vs stream) because
> the stream mode is _so_ pathologically slow when dealing with large
> files, but you can't have only the mmap mode because not everything is
> mmapable. I could presumably do mmap() and non-mmap() modes for grep
> too. But given that the non-mmap() mode isn't pathological there, I lean
> against bothering until somebody really complains...
>
> (What might be better in this case was a getline() variant that returned
> a chunk of mmapped file, start pointer and length. I designed toybox sed
> to work with start and length so it could handle built-in nulls,
> although A) I don't think that part's finished, B) it's probably also
> assuming each segment is null terminated because there's no way to get
> start/length semantics out of regex() and copying the data into a

Err
 the reference above lead me to a nice regex paper talking about the 
extensions to Tompson's NFA paper for such uses: unanchored
http://swtch.com/~rsc/regexp/regexp1.html
I
 thought the plan9 library look pretty easy to read, it seems like it 
would take length & buffer pointer if not several others should

> temporary buffer would kill performance. And of course mmap() backing
> store means you don't free() the result _and_ that on a 32 bit system
> you run out of virtual address space when dealing with large files...)
>
> Anyway, there's potential things to do here, but after 1.0...
>
> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net

 1427058543.0