[Toybox] Did I mention the release?

enh enh at google.com
Tue Apr 7 08:53:04 PDT 2015


On Sat, Mar 21, 2015 at 12:16 PM, Rob Landley <rob at landley.net> wrote:
> On 03/20/2015 12:42 AM, enh wrote:
>> On Thu, Mar 19, 2015 at 6:52 PM, Rob Landley <rob at landley.net> wrote:
>>> On 03/15/2015 08:47 PM, enh wrote:
>>>> as a particularly observant follower of my AOSP gerrit activity might
>>>> discern, from the fact i committed a small improvement to the
>>>> _tool_box ps last week, i'm not expecting to switch to the toybox ps
>>>> in the near future.
>>>
>>> I wouldn't switch to what we've got right now either. That's why I'm
>>> fixing it. :)
>>>
>>> That said, I should look at toolbox ps and see what _it_ does, because
>>> it's at least as much an example as ubuntu.
>>
>> i think the most important thing is to implement -o well.
>
> Yeah, -o is what I'm wrestling with. For most fields, the column name
> and the -o value are case-smashed versions of each other, and "nice/NI"
> and "comm/COMMAND" are abbreviations of each other (in opposite
> directions), but then you get "etime/ELAPSED" and... Grrr.
>
> There is a cleanup to be done here. Making it not ugly is... tricky.
>
> I'm leaning towards just matching "longest common initial segment" with
> an early stop searching at complete matches, but I _can't_ have a
> minimum length requirement because etime/elapsed have _one_ letter in
> common. (Luckily they're the only "e" at the moment, but still.) And
> that means typos can give unexpected results instead of errors which is
> nasty. (I can at least continue searching if it didn't end and complain
> if it was ambiguous, ala "we only matched the initial p and there's 3 of
> those...")

sgtm.

> On the other hand, I still need to special case args/COMMAND and if
> etime is the other special case then requiring one of the strings to
> _end_ to give me a match...
>
> This is why simple things take so long, when there's a better way on the
> tip of my tongue but I can't figure out what it _is_...
>
>> other than
>> -Z which Ubuntu has too, i think the only oddity that we wouldn't be
>> better off without is --abi. right now developers often care about
>> whether a process is 32 or 64 bit.
>
> Is making abi an -o column acceptable?

yes. (i actually wonder whether more useful than what we currently
have ("32" and "64") would be to show the different personalities. so
on x86-64 you could distinguish between x86 and x32, for example.)

>> the main behavioral oddity is that
>> it defaults to showing all processes (because people are usually doing
>> "adb shell ps" rather than "ps")
>
> Is having "alias" commands in /etc/profile or similar a sufficient fix
> for this sort of thing?

i think the only obvious problem with that right now is that "adb
shell ps" and "adb shell" followed by "ps" would behave differently.
but that's probably a bug anyway. it's definitely an approach we
should look into.

>> and the last unrecognized argument
>> that don't start '-' or with a digit is taken as a "name filter" ---
>
> What if you get more than one unrecognized argument that doesn't start
> with a digit?

i wasn't being sloppy when i said "last". i was accurately describing
the sloppy implementation. all but the last will be silently ignored.

> I'm ok with implementing "pgrep" and "ps" in the same toys/*/filename.c
> and having the pgrep functionality available to ps. (Would having the
> filter be interpreted as a regex cause problems, do you think?)

yagni. let's wait and see if there's a good reason for people not to
switch to pgrep before we worry about adding stuff to ps.

>> only processes whose COMM or NAME exactly matches are shown. i suspect
>> that this is an exact match means it's not used all that much, but
>> it's hard to know without taking it away and seeing if anyone
>> complains. i might be reduced to that :-(
>
> I like figuring out how to fit special cases into a plausibly deniable
> "I meant to do that" general case. (Sometimes the elegant solution is to
> back up and solve a bigger problem.)
>
>> it might be worth my while introducing toybox's pgrep as soon as
>> possible and encourage people to switch over.
>
> Which is also in pending.
>
> Lemme merge pgrep into the ps cleanup, and have unrecognized ps
> arguments... oh, that's nasty. Is "ps ax" unreognized? Hmmm. It's a
> thing people are going to try to do...
>
> (I think I can make this work, I just have to figure out what behavior
> we _want_. I guess start the option string with "?&" and presumably
> lib/args.c will handle it? The fiddly bit is that nondash arguments and
> dash arguments have _different_meanings_ and we don't currently record
> whether or not we got a dash because things like tar don't care. Hmmm...
> I don't think one flagset maps cleanly onto the other either, so
> rejecting _invalid_ flags becomes tricky...)
>
>>> I really want to genericize lib/lib.c:human_readable() so it can
>>> reproduce the variants we need, and factor out the column spacing code.
>>>
>>> What would really help here is test cases. (Extract this tarball, cd in
>>> and ls -h, and the output should look like this textfile...)
>>
>> a tarball or function for all tests to use to test with is a great
>> idea. it would make it easier to have interesting cases available for
>> all the commands to be tested against. (dangling symlinks, empty and
>> non-empty directories, non-regular files, ...)
>
> Indeed. Right now I've hijacked blkid's filesystem images for a couple
> other tests that just need a known file, but a tarball with different
> kinds of stuff in it would be nice.
>
> Unfortunately, this opens up the "test as root" can of worms because
> "device nodes" are an obvious thing to put in there, so possibly _two_
> tarballs, one "root" and one "not root".
>
> (Hmmm, would moving the blkid files into said tarball make sense?
> Unfortunately, uncompressed they're huge, the f2fs one alone is 128
> megs. Maybe if I get tar extracting sparse files by default?)
>
> In case it isn't obvious, the pending directory isn't the only one I
> need to do a cleanup/extend/rewrite pass over. The test suite could use
> several months of fulltime poking.
>
>>>> toolbox/top.c
>>>>   output, options, and behavior all differ enough that any switchover
>>>> is going to be interesting. punt for now, worry about these later.
>>>
>>> I would very much like to unify top, htop, and iotop (but haven't looked
>>> into how the third is implemented). Also there's the possibility of
>>> common code with vmstat and ps.
>>>
>>> The other thing is that "top" does terminal control similar to "less".
>>> And "watch" should do the same, and ps does a _little_ of the same.
>>> (They query the terminal size and truncate output at screen width and
>>> screen height.) So my plan was to genericize _that_ code, in a way that
>>> works with shell command line history editing and vi.
>>>
>>> None of which is directly relevant to you, you just want it to work. But
>>> this is why promoting it is harder than it looks...
>>
>> for me "as good as the desktop" is a long-term goal that can take
>> several releases.
>
> Unfortunately "as good as the desktop" is under-specified. Take Red Hat
> 6-ish circa 1999: that was a desktop, was that good enough? The current
> commands are much much bigger and more elaborate, but are they enough
> better to justify the bloat? (Answer: it depends.)
>
> I intend to keep improving things after 1.0, but "crazy things gnu did"
> is not an interesting thing to chase. (I also consider interface churn
> to be a cost, just like complexity. Which means some things aren't worth
> changing even if the result would be an improvement because we break
> people's scripts, and the scripts in their heads. For example, I type
> "ps ax" on busybox systems all the time, it doesn't parse that but it
> doesn't complain either.)
>
>> "better than toolbox" is enough for right now. it
>> pains me to waste time on things like WCHAN for toolbox ps when i
>> could have been doing that for toybox ps instead, but until we're
>> switched over...
>
> Working on it. :)
>
>>>> toolbox/renice.c
>>>>   ours has nonstandard -r, -t (equivalent to -n?), -g (“get”); do we need these?
>>>
>>> Android is as least a much a standard as gnu. If you've got existing
>>> users who will miss the options, patches welcome.
>>
>> things like renice are tricky because they're not heavily used. i need
>> to work out first whether the extra stuff is useful or just random.
>> (one particular problem is that people have added things that weren't
>> quite what they really wanted because it was easier to do that and
>> good enough for what they needed to do, than it was to implement the
>> full standard option they'd have used if it had been available.)
>
> If you do work it out, please document the hell out of it. All toybox
> commands have complete --help as condition of promotion, I've spent more
> time on help text than on command implementation more than once.

btw, you might want to check the tabs versus spaces in the --help
output. try "umount --help" for example. (i could send you a patch
cleaning up the whole tree, but i assume you'll want to get distracted
by writing a script so detect/prevent this in future anyway :-) .)

>>> Possibly I should have a "how to make testcases" section in my toybox
>>> talk on wednesday. A good test suite really helps development. (Alas, as
>>> I've learned, what counts as good is another thing apparently requiring
>>> some explanation...)
>>
>> (my [lame] excuse is that even with my patch the output still isn't
>> _exactly_ the desktop output, and i'm not sure how anal we should be
>> about that.)
>
> I'm way behind on testsuite work too.
>
>>>> toolbox/uptime.c
>>>>   Android's output format is completely different (all about time
>>>> spent idle/sleeping, not concurrent users or processes); not clear
>>>> that merging makes sense.
>
> (I note our uptime is producing what ubuntu produces. Producing what
> toolbox produces isn't hard.)

yeah, the code is easy. the question is where we want to put it. an if
(__ANDROID__) in the existing uptime.c? add support for duplicate
filenames to your build system and have the one most specific to your
target OS win? have an output option like --android (that's maybe
defaulted in main.c based on an if (__ANDROID__))? lack of options is
not the problem here :-)

>>> I should take a look. (Is there documentation on any of the android
>>> commands? Each toybox command at least produces --help output, and the
>>> test suite is gives a gazillion usage examples, or at least _should_...)
>>
>> no. most of the toolbox commands don't/didn't even have _usage_
>> output. the built-in help is one of my favorite things about toybox.
>> (it was when i was trying to show it off this morning that i found the
>> "top --help" bug i sent the patch for. so that was embarrassing.)
>
> I'm tellin ya, toys/pending is there for a reason. :)

yes, but your "no small patches because i want to totally rewrite it"
policy hurts us a bit. half an eye is better than no eye. (and at some
level you must believe that or you wouldn't have a pending directory
at all :-) )

>> Android's uptime looks like this:
>>
>> $ adb shell uptime
>> up time: 04:05:17, idle time: 08:00:53, sleep time: 00:00:00
>>
>> how would you offer both formats?
>
> Add an uptime -a, and use alias in .profile the same way ls --color is
> generally aliased?
>
> /proc/uptime gives uptime and idle time. Easy enough to grab:
>
>   if (readfile("/proc/uptime", toybuf, 4096) < 0 ||
>       2 != sscanf(toybuf, "%f %f", uptime, idle))
>         perror_exit("/proc/uptime");
>
> Not sure how you get sleep time, hmmm...
>
> Your uptime.c is ignoring the uptime from /proc and fetching "elapsed"
> as either clock_gettime(CLOCK_BOOTTIME) or an
> ioctl(ANDROID_ALARM_ELAPSED_REALTIME) on /dev/alarm. Why is it doing
> that? (In fs/proc/uptime.c uptime_proc_show() is doing
> get_monotonic_boottime(), which is what sysinfo is also getting but only
> with second granularity...)
>
> It looks like sleep is uptime from /proc (defined as "wall clock time"
> in Documentation/filesystems/proc.txt) minus gettime(CLOCK_MONOTONIC)
> which is presumably "jiffies spent" or some such.
>
> Hmmm... it's test cases that are hard. I _think_ the vanilla kernel
> should provide an api for all the right clock variants now, so in theory
> I can do this with the if/else stack in the android implementation, but
> since the old one is using android specific apis to fetch stuff I'd have
> to get an android device with an android kernel and put it to sleep for
> a bit (I should be able to wget a static toybox into that champion
> gnuroot thing on my nexus 5, but what counts as "sleep" here? When the
> screen is off?)

yeah. the really weird part is that uptime (and, iirc, sleep) is
measured as wall clock time but idle is measured by core. so idle is
typically many multiples of uptime.

uptime is the least of my worries though. it's an awkward case for the
transition, but it's not hurting anyone. i suspect i could remove it
and it would be a long time before anyone noticed.

>> or were you just thinking that we'd have a big if (ANDROID)?
>
> While I _could_ have a CONFIG entry for UPTIME_DEFAULT_A (so -a is the
> default and supplying it switches it off), and even have a compile time
> probe to set its default value based on whether or not we're building
> against bionic... it wouldn't be my first choice.
>
> If you sit down at toybox, it should behave like toybox, otherwise users
> are confused. Each deviation from that needs to be justified.
>
>> about the only code the two formats would
>> share would be the duration formatting. and Android doesn't have
>> anything like utmp, so it's not obvious we could show the regular form
>> output even if we wanted to.
>
> Eh, half the command's data collection. The output is:
>
> $ ./uptime
> 01:09:18 up 53 days, 43 min,  170 users,  load average: 5.94, 4.04, 2.55
>
> The bit at the start is:
>
> // Time
> xprintf(" %02d:%02d:%02d up ", now->tm_hour, now->tm_min, now->tm_sec);
>
> And doing three variants of that is just a for loop based off 3
> different pointers, with the other output format as an else.
>
>
>>> Toybox was always intended to work alongside other packagesin the same
>>> $PATH, but toybox is also explicitly designed to allow forks to add new
>>> commands unobtrusively. Not only is each command self-contained (a trick
>>> I had to explain in person Denys Vlasenko at the 2010 ELC before he
>>> started doing it in busybox), but the directories are self-contained too.
>>>
>>> So if you want a locally-maintained directory of android commands, all
>>> you have to do is:
>>>
>>> 1) mkdir toys/android
>>> 2) echo "Android" > toys/android/README
>>> 3) cp mycommand.c toys/android
>>> 4) make distclean
>>> 5) make defconfig
>>> 6) make
>>>
>>> (Once upon a time there was a directory list in scripts/genconfig.sh to
>>> manually control the ordering, but I ripped it back out and just did a
>>> "sort -r" to put posix first and examples last and called it good enough.)
>>
>> oh, great. i hadn't actually looked; i'd just assumed there was a
>> hard-coded list.
>
> There was. I decided it was a bad idea and removed it to make life
> easier for exactly this use case. :)
>
> (Sort of the opposite of the busybox approach. I want this code to be
> _useful_.)
>
> Note that the command namespace is actually flat, so if I add some
> random thing like "fstrim" to toys/other and you've got one in
> toys/android, your build will break. (You can locally delete the
> duplicate out of toys/other, but making git not conflict is left as an
> exercise to the user.) So there's still _some_ incentive to get stuff
> upstream, but even if you don't, if you only resync each release the
> cleanup's trivial.
>
> (And as always lib/ may change a bit over the years and break out of
> tree commands. I don't try to do it, but I can't _not_ do it if I can't
> test the command. Fixup's probably pretty obvious after a bisect, though.)
>
>>> As for additions to the lib directory, the build just grabs lib/*.c and
>>> then lets "-ffunction-sections -fdata-sections -Wl,--gc-sections" sort
>>> it out, so you can add a lib/android.c and as long as the names don't
>>> conflict life is good. (I'm aware your build doesn't, but presumably
>>> adjusting your build is easier for you than for me. :)
>>
>> yes, given that i'd prefer not to have differences from upstream
>> (which is why i'm not committing any of my own changes locally; i'm
>> merging them back if/when they make it upstream) it's useful to me
>> that your build will just pick up extra lib/*.c files too. (i usually
>> test on the host first.)
>
> I'm all for it. :)
>
> Sorry I'm slow at merging, as you can see from above just _discussing_
> this stuff spawns design tangents...
>
>>> Adding stuff to lib/lib.h is bad (merge conflicts), but you could
>>> #include "toys/android/lib.h" after the #include "toys.h" in your local
>>> commands, and have the library header live there.
>>>
>>> (You could even build with just the android commands enabled, mv toybox
>>> toolbox, and point symlinks at that during the install if all you wanted
>>> was the toybox infrastructure but wanted to keep it separate. Or use
>>> scripts/single.sh or the new "make change" target, although I dunno how
>>> that translates to your build infrastructure.)
>>
>> no, there's no particular benefit to keeping things separate. although
>> if long term you plan on having a toys/android upstream i might want
>> to go with toys/android-pending or something so i can keep the two
>> sets straight.
>
> If you would like to submit android-specific commands just so they could
> use the toybox infrastructure, I'd be ok with a toys/android directory.
> I would probably have the things in it default "n" in defconfig, though.
>
> I need to set up a local bionic test build, but the half hour I spent
> poking at it just confirmed that bionic hasn't got a makefile that lets
> it build out of tree, and extracting it from the whole android build is
> another one of those problems best explained by linking to xkcd:
>
> https://xkcd.com/214/
>
>>> I am behind on my reading for android.
>>> Possibly these commands have different permissions... selinux extended
>>> attributes? Or maybe setprop is only a debuging command or something?)
>>
>> no. everything in toolbox is in one binary (with the exception of r),
>> so they're all the same command as far as SELinux is concerned.
>> setprop works by talking to the property service over a socket, and
>> that uses SO_PEERCRED, and then there's some kind of SELinux lookup,
>> and there are extra checks for stuff like the "ro.*" properties, and
>> ...
>
> You see my problem!
>
> I need to reread this thread after my california trip and update the
> roadmap. Quite POSSIBLY what I need is a slushpile.html where instead of
> "why this command should go in" I have a list of commands and notes
> about what I plan to do with each command. (Brief implementation details
> like the above, "why this is hard", "what's blocking it", "remember to
> handle these cases"...)

i saw someone at work use xxd(1) the other day, and now i'm thinking i
should change my default hexdump allegiance. i'm surprised i hadn't
come across that before. the reasonable default output and the
bidirectionality was impressive.

>>>> i also plan on sticking with BSD grep for now (we can worry about the
>>>> quality of the BSD regex implementation in bionic and fleshing out the
>>>> missing grep features in toybox when everything else is settled):
>>>
>>> No rush. The "regex|regex" thing with multiple grep -e is something I
>>> need to figure out how to deal with anyway. (The problem is | only means
>>> something in extended regexes. I had a patch to musl to allow \| to work
>>> in non-extended regexes, but then they rewrote their regex plumbing. I
>>> may have to break down and do multiple passes over the data, or maybe
>>> extend and factor out the ghostwheel() function in sed.c.)
>>
>> (BSD just tries them all in order, separately.)
>
> And if they overlap?
>
> Somebody linked me to a post about the old grep being fast by ignoring
> lines breaks and operating on large data blocks instead (ideally
> mmap)... Ah:
>
> https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
>
> And my reply after reading that was basically "I'm not doing that".
> Locality of reference and avoiding multiple passes over the data still
> seems like a good idea, but iterating over readline() is a common
> pattern. (What I worry about there is "readline returns a gigabyte".
> There's no length limit on the line, and no obvious way of _adding_ a
> length limit that doesn't boil down to not properly handling long lines...)
>
> *shrug* I did "fast" and "slow" modes for tail (mmap vs stream) because
> the stream mode is _so_ pathologically slow when dealing with large
> files, but you can't have only the mmap mode because not everything is
> mmapable. I could presumably do mmap() and non-mmap() modes for grep
> too. But given that the non-mmap() mode isn't pathological there, I lean
> against bothering until somebody really complains...
>
> (What might be better in this case was a getline() variant that returned
> a chunk of mmapped file, start pointer and length. I designed toybox sed
> to work with start and length so it could handle built-in nulls,
> although A) I don't think that part's finished, B) it's probably also
> assuming each segment is null terminated because there's no way to get
> start/length semantics out of regex() and copying the data into a
> temporary buffer would kill performance. And of course mmap() backing
> store means you don't free() the result _and_ that on a 32 bit system
> you run out of virtual address space when dealing with large files...)
>
> Anyway, there's potential things to do here, but after 1.0...
>
> Rob



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.

 1428421984.0


More information about the Toybox mailing list