[Toybox] Project progress for 0.8.3.

Rob Landley rob at landley.net
Sun May 24 13:55:21 PDT 2020


Two weeks since the release and Wikipedia[citation needed] hasn't noticed yet,
which is fine and normal (I just don't want them to be actively _wrong_), but
their "Project progress" section explains that "in 2015"... which was 5 years
ago now? And they still link to http://www.landley.net/toybox/todo.txt from 2011
which is PURELY HISTORICAL...

Right. I can at least do a current 2020 analysis. Pulling up
https://landley.net/toybox/status.html which I updated for 0.8.3, here are the
lists. Of the 343 proposed commands it lists:

--- completed (204 commands)
acpi arch ascii base64 basename blkid blockdev bunzip2 bzcat cal cat catv chattr
chgrp chmod chown chroot chrt chvt cksum clear cmp comm count cp cpio crc32 cut
date devmem df dirname dmesg dnsdomainname dos2unix du echo egrep eject env
expand factor fallocate false fgrep file find flock fmt free freeramdisk
fsfreeze fstype fsync ftpget ftpput getconf grep groups gunzip halt head help
hexedit hostname hwclock i2cdetect i2cdump i2cget i2cset iconv id ifconfig
inotifyd insmod install ionice iorenice iotop kill killall killall5 link ln
logger login logname losetup ls lsattr lsmod lspci lsusb makedevs mcookie md5sum
microcom mix mkdir mkfifo mknod mkpasswd mkswap mktemp modinfo mount mountpoint
mv nbd-client nc netcat netstat nice nl nohup nproc nsenter od oneit partprobe
passwd paste patch pgrep pidof ping ping6 pivot_root pkill pmap poweroff
printenv printf prlimit ps pwd pwdx readahead readlink realpath reboot renice
reset rev rfkill rm rmdir rmmod sed seq setfattr setsid sha1sum shred sleep sntp
sort split stat strings su swapoff swapon switch_root sync sysctl tac tail tar
taskset tee test time timeout top touch true truncate tty tunctl ulimit umount
uname uniq unix2dos unlink unshare uptime usleep uudecode uuencode uuidgen
vconfig vmstat w watch wc which who whoami xargs xxd yes zcat

--- pending (68 commands)
addgroup adduser arp arping bash bc bootchartd brctl cd crond crontab dd
deallocvt delgroup deluser dhcp dhcp6 dhcpd diff dumpleases exit expr fdisk fold
fsck getfattr getty groupadd groupdel host init ip ipaddr ipcrm ipcs iplink
iproute iprule iptunnel klogd last lsof man mdev mke2fs modprobe more openvt
route sh stty sulogin syslogd tcpsvd telnet telnetd tftp tftpd toysh tr
traceroute traceroute6 udpsvd useradd userdel vi wget xzcat

--- todo (71 commands)
ar at awk chfn chsh cols compress csplit diff3 dig dir dosfslabel ed fsck.ext2
fsck.vfat ftpd fuser genext2fs getevent groupmod gzip hexdump hostid ipconfig
iwconfig iwlist join kexec kinit less mkfs.vfat newfs_msdos newgrp nfsmount ntpd
pathchk pinky rdate resize2fs resume rpm2cpio rsync runcon sdiff sendmail sfdisk
sha224sum sha256sum sha384sum sha3sum sha512sum shutdown stdbuf sudo sum tabs
tput tracepath tune2fs unexpand unzip usermod users vdir zcmp zdiff zegrep
zfgrep zip zless zmore

--- which means
So a first guess would be more like 70% done, because if you just take 204 plus
half of 68, divided by 343 you get just under 70%. But that's not right for a
BUNCH of reasons. THe first of which is those three lists aren't quite
everything, when I run "make distclean defconfig toybox; scripts/mkstatus.py" it
also says:

uncategorized: blkdiscard rtcwake getopt readelf eval exec export shift unset
-sh -toysh -bash

But the uncategorized stuff actually means I need to add a few things I've
already DONE to the roadmap, but half of it's shell  builtins (eval, exec,
esport, shift, unset, -sh, -toysh, -bash) that don't actually count as separate
commands.

blkdiscard and rtcwake I already promoted, they'd go in the "done" category if
they were listed in the roadmap so the script could find them. Readelf is in
pending but isn't hard, it's just "spend an hour reading 600 lines closely" and
probably staring at specs and test files. I've mostly been waiting because it's
a recent addition and Elliott was still poking at it.

That leaves getopt, which is legitimately stuck in pending due to being a
design-level nightmare: the one that's there works fine but pulls in a whole
second set of option parsing logic from lib.c that nothing else uses. Do I want
to accept that or try to adapt it to use lib/args.c which may change the
resulting semantics in ways that require a close reading of the spec? (Note that
I've never used this command because it sucks, and the shell's builtin "getopts"
is an UNRELATED command, and collectively I'd really rather not. But alas, it's
in posix, and somebody's script is using it...)

--- mkroot

Remember my whole "self-hosting build" goal? Well I integrated a system builder
into toybox, which Wikipedia[citation needed] will probably never notice. There
are only two more commands that needs missing from toybox defconfig to create a
usable basic standalone system: "sh" and "route", both of which are being worked
on. The sh that's there is already semi-usable, I'd guess it's 2/3 done maybe?
(Hard to tell, it's difficult to scope work you haven't done yet.)

Route is still in pending because I'm ambitious and want one that does multiple
routing tables, which the existing implementation wasn't designed for (but
neither was debian's; I want to do _better_ than net-tools).

As for creating a self-hosting build environment, the list I have written down
for building the old Linux From Scratch version I was testing with was
(according to scripts/install.sh):

PENDING="dd diff expr ftpd less tr vi wget awk sh sha512sum sha256sum unxz xzcat
bc bison flex make nm ar gzip"

The commands "diff, ftpd, less, vi" are there because humans log into a build
system to debug stuff sometimes, and it's nice to have basic amenities. Not
STRICTLY required (you can use network mounts instead of ftpd, it's just my old
build system used ftpd), but eh. There's been quite a lot of work on vi in
pending, and there's a diff there I'm told works. I did "watch" already and
that's the basic plumbing "less" uses, to be honest that one's held up by "more"
not sharing code with it (and quite possibly a design/conceptual level, need to
frown at it some more). diff sucks in diff3 and sdiff and my main complaint is I
wanted to use braham cohen's patience algorithm. :P

The commands "bc, bison, flex" are only there because modern linux kernel
development has gone off the deep end into crazytown. Grrr. There's a very large
bc in pending I need to slim down, and I've never used yacc/lex for anything and
need to learn to in order to write replacements.

gzip: I have the start of a deflate implementation, I need to get back to it. I
was trying to be binary identical with what other gzips produced which hit the
"when do you flush the dictionary" question, which doesn't seem to have an
official answer. Test the debian output and match that, I suppose.

ar would be trivial if it wasn't for -s mode, the format of which is completely
undocumented. (Not hard, just annoying.)

nm is basically "readelf with a different output format", it's pending on the
readelf cleanup. (Which is half done, I just got distracted from it.)

I've started to clean up dd something like 5 times and gotten distracted, it's
not _hard_ it's just long and not something I want to start again without a
block of uninterrupted time laid out for it. (At least a week, it's constructed
entirely out of corner cases.)

Last time I tried to clean up expr I hit the problem that posix had suffered a
regression (its html renderer lost grouping information), which is long since
fixed I just never got back to it...

It would be trivial to cleanup tr except I want to teach it utf8 support, which
makes everybody who knows about this scream in pain, AND YET...

awk is probably the largest remaining can of worms I haven't opened. Like most
people I only use it to "cherry pick the 4th word from this list", but I need to
implement the full language. The busybox one was 2800 lines when I was
maintaining it.  (It's probably longer now.)

sha256sum and sha512sum are tangled into the "sha-3" todo item (see also
sha224sum sha384sum and sha3sum). I did my own md5sum and sha1sum
implementations way back when (which are merged and share code) and I want to
see if I can fit the rest in there, but haven't sat down to really focus on it
yet. (It's mathy, there's research. Like the compression algorithms. I do NOT
wanna be distracted from chewing on it halfway through, so big chunka time.)

unxz and xzcat: see "mathy", above. I found public domain implementations of the
decompressor way back, glued them together, and stuck them in pending. I should
make sure there haven't been security fixes or obvious format changes, and then
do to that what I did to bunzip2 ages ago
(https://git.busybox.net/busybox/commit/?id=0d6d88a2058d).

make is a post-1.0 todo item, and really properly belongs in qcc.

--- the rest of the pending/todo items

As for the remaining pending and todo commands, a chunk of them are shell
aliases and shell builtins that are part of the "toysh" todo item: bash sh toysh
cd exit

The "user management" commands (addgroup adduser delgroup deluser groupadd
groupdel useradd userdel chfn chsh groupmod newgrp usermod) have been low
priority because android doesn't use normal user accounts and never will (each
app is installed as a different UID, that's a legacy decision from before
containers happened). I'd like to convince android to create a "posix container"
within which you can have multiple users and run binaries you build, but that's
an ongoing discussion. None of this is hard to clean up and promote, I just
haven't bothered because I didn't have an immediate use case. Now that mkroot's
there using a simple /etc/passwd I should cycle back to them.

bootchard was an external submission: I've never used it. It's not hard to clean
up, I just have to learn how to use it in order to _test_ it...

I've never been a big user of cron, so "crond crontab" are more "easy to clean
up, but I don't have tests for them". I've used "at" before, but I think it
hooks into crond somehow? (There _is_ a server component...) Really I just
haven't looked at those yet.

I haven't prioritized "deallocvt openvt" because VGA hardware virtual terminals
aren't really a thing anymore, but I should get them out of the way. (It's
_just_ rebooty enough that I've been reluctant to try it on my laptop for fear
of it doing a wobbly and me having to reboot to get my screen back, losing all
my open windows on 8 desktops, but... gotta bite the bullet sometime. It's not
as scary as testing rm -rf for the first time... Ok, done, and email fired off
to see if the original submitter can test it.)

People submitted "tcpsvd udpsvd" (which I don't use) after I already implemented
netcat -l (which I do use, and it's got a UDP mode because of google guys
wanting that, possibly for netconsole) and I'm going "this should be merged
somehow"?

ip ipaddr ipcrm ipcs iplink iproute iprule iptunnel: I don't use the "ip"
command, and its existence annoys me. Refusing to update ifconfig and route to
new APIs and instead throwing them out and replacing them with a giant hairball
that works like git with subcommands is sad. I'm updating the original commands
to do the right thing, and implementing standalone versions of anything that
ONLY this does. I do not consider these part of the 1.0 release, when I'm done
they should be aliases for the standalone commands (because other people prefer
that UI due to familiarity). This is there now because it was an external
contribution and I didn't want to stand in the way.

I started cleaning up "man" once and other people were changing it while I was
changing it, so I deleted my changes and merged theirs, and haven't looked at it
again since. Possibly it's quiet enough now, but I haven't cycled back to it.

There's a pile of network stuff (arp arping brctl dig host traceroute
traceroute6) that's not hard, I just haven't needed it yet. It's all behind
"route" on the todo list, which is the next one I _do_ need. I note that
traceroute/traceroute6/tracepath are elaborate variants of "ping", and host/dig
are the same command with different UI and output.

There's also network client/server stuff (telnet telnetd tftp tftpd wget httpd
dhcp dhcp6 dhcpd dumpleases) which is more "elaborate but not hard". More time
consuming than difficult. (Modulo I need to test ph7 with httpd, and that and
wget need https integration via the command line stuff, see also
http://lists.landley.net/pipermail/toybox-landley.net/2017-September/009158.html
and http://lists.landley.net/pipermail/toybox-landley.net/2016-March/004865.html
which means I need to install bearssl into mkroot for testing which sounds like
SO much fun... Sigh. But that's foisting a difficult bit off onto an external
program. It's a pity dropbear never provided https, but I can see wanting to
avoid the key management part of that.)

The group "mkfs.vfat, newfs_msdos, dosfslabel, fsck.vfat" I've done some work
on, also I also want genfatfs and mtools support, but am not adding them to the
roadmap just now. Got distracted, haven't gotten back to it. The transitions
from fat12, fat16, and fat32 are kind of magic/evil but other than that it's
pretty straightfward...

Similarly, I did about the first half of mkext2fs and genext2fs back in the day
(like 2006) and got REALLY distracted by that whole "leaving busybox" thing,
still haven't gotten back to it and in the mean time ext4 happened which I do
_not_ understand. That impacts fsck.ext2 (what exactly can go _wrong_?) and then
there's tune2fs (trivial once the others are in) and resize2fs (kind of an fsck
variant almost). It's all a group, if I _just_ want ext3 I can probably do it in
a couple months? If I wasn't doing anything else at the time...

fsck itself is just a wrapper around the filesystem-specific fsck commands, I've
got one in pending but until there are other fscks to test it with... (I mean I
_can_ clean it up.)

fdisk grew a new format when disks hit 2TB: MBR I understood GPT I do not. It's
on the todo list. I sort of want sfdisk at the same time (scriptable!) but
sfdisk turns out to be really ugly from a UI standpoint and conventional fdisk
is scriptable via "echo | fdisk" so...

screw sendmail, why is it on this list? (Because some other package had it. I'm
not doing it, that whole ecosystem is crypto all the way down now.) And compress
is obsolete (due to historical patents, it got killed by gzip), I don't care
what posix says. The .Z file format is like supporting arj.

iwconfig and iwlist are sort of an ecosystem with wpa_supplicant, I'd declare it
out of scope except the hardware it controls it's pretty widely used these days.
I need a wrapper script to associate my laptop with an access point from the
command line, which I've never quite managed to do on _debian_, so... (Because
access keys. There's a tool to turn a wpa_passphrase into what the hardware
consumes and it's crotchety and all handled by magic GUI wrappers people make
that work VERY HARD to hide the details. I know the theory but am always missing
a corner case somewhere. Closest I got was trying to associate with my phone at
UT when I first got this laptop, ala
https://landley.net/notes-2019.html#17-04-2019 and it did not fill me with
confidence...)

lsof and fuser are similar and tricky. They should share code, the PROBLEM is
lsof's command line is horrific and has 8 gazillion weird little corner cases
and mostly I don't care but the one that was submitted doesn't support +D which
is one of the most useful things it does (recursively show all open files under
directory), so... (Plus merging the ipv4 and ipv6 plumbing in its internet
listing. Again, me being picky and insisting on things I don't TECHNICALLY need
to do...)

hexdump is trivial to do, except I've already got od and hexedit and they're not
sharing code and am reluctant to ADD A THIRD. Except "hd blah" is the one I
personally use, so... :)

ed is vi, only less so. I really don't want it, but apprently implementing vi
makes it not that bad? Or something? (I implemented _sed_ which should be able
to do everything you need from ed, but no. There are some serious geezers out
there who insist.)

sudo is easy to implement and hard to prove correct. :P

rsync isn't actually that hard (if you only implement -e ssh and ignore that
server nonsense) and is well documented
(https://rsync.samba.org/how-rsync-works.html). They even migrated from md4 to
md5 (for the look of it, it never had a security implication because it wasn't
used for that) so I've already _got_ most of the plumbing. I just haven't sat
down to do it yet.

ntpd shouldn't be in there, I implemented sntp and that covers it. (Removed.)
I'm not sure if that covers rdate or not (I _can_ do an rdate, but when I put
that in there I thought that qemu had an rdate server built in, and it turned
out it was passing through 10.0.2.2 to inetd on the host that had it built in,
and that was a previous distro and devuan hasn't got it. Hmmm...)

pinky is a trivial finger, it's an afternoon's work sometime. Hard part's
finding a finger server to test it with. :)

getfattr: android sent me one, it works, I don't use xattrs.

rpm2cpio is trivial but I'm not sure it's the right approach (there was better
rpm and deb support in busybox 15 years ago, and the trick was instead of
"database" just "directory where the header with the metadata from each
installed package was copied to, under the original package's name", which
worked surprisingly well. Listing installed packages was just an ls of the
directory, for one thing.) Can of worms I have yet to open...

stty: I need to clean this up. It's fiddly. Not hard, just elaborate and has no
tests. :(

modprobe: I don't use modules much myself. I know the theory, and insmod's
already promoted...

kexec: simple, requires rebooting to test, I was holding off until I had qemu
builds going and now that mkroot's in I should circle back around to this.
(Modulo qemu doesn't always reboot cleanly with the -kernel command line
argument used as a builtin bootloader. I THINK kexec won't care because it
doesn't go back through the bios?)

kinit/init: some horrible klibc thing that was half-assing another command, I
just need a proper init. (now THAT is a can of worms, I need to dig up my notes
from https://landley.net/notes-2015.html#03-06-2015 which are at
http://landley.net/systemd-notes.txt and make sense of them...) A "shutdown"
command is also part of init. (As is resume, more or less.)

getty and sulogin is more early system bringup stuff my systems don't use. I
know how to do it, just haven't had reason to. "users" is kind of in that bucket
too (list logged in users. From the minicomputer days when we all shared
computers!) And "last". But klogd and syslogd are useful-ish (I mostly just
dmesg myself which the kernel does for you), but now that I have mkroot I really
_should_ get to these. (Android has its own and won't care, of course...)

mdev is a thing I created way back when, which other people added a lot of stuff
to later because they were using it, but then devtmpfs was invented and half its
reason for being went away, but if you collapse hotplug into it I guess it's
still useful? And there's also "notify and take action when a device shows up",
which is still useful. Needs design work to figure out what it should look like
now, starting with researching other people's use patterns of it.

dir and vdir are just ls with a different output format, haven't bothered
because I last used dos in something like 1992. On the one hand, it looks
trivial (it's ls flag mapping), on the other... I've never used either one and
would have to read the man page closely to figure out WHAT flag mappings this
is? (And then there's the question of why either SHOULD be there? We have a
perfectly good ls. Both come from tizen, which I haven't heard from in years,
and although both are in coreutils... why?) Come to think of it, dir and vdir
should probably be shell "alias" of ls...

I have a hostid already, it's in toys/example because a 32 bit identifier is no
longer globally unique, so the command's reason for being stopped working. But
that's done, it's just in the pending list because it's not in defconfig because
"example" isn't part of defconfig either.

nfsmount is a trivialish wrapper around mount, it supplies the password so you
don't have to -o it on the command line (making it visible to other users on the
machine while mount runs, of which there should be none). There's an smbmount
too. I haven't gotten to it because I don't use nfs or samba (although I keep
meaning to write a samba server in toybox).

sum is obsolete, I don't care what posix says. It predates sha1sum, md5sum, and
for that matter crc32. Not doing it, it's in the list because some other package
toybox has in the "when we do all these we replace that" had it and I had it in
the "maybe" list for that package. I should take it out... it was in sash.

unexpand "converts spaces to tabs". Haven't gotten around to it yet. :)

tabs sets tabstops on a terminal. Is that even still supported?  Oh goddess, yes
it is. That's horrifying. Leave it there for now, figure out what to do about it
later. (I mean, it looks like it's just a wrapper for an ioctl, but _ew_.)

tput does things with the terminfo database that were last relevant in 1976,
because we had phyiscal teletypes, then glass ttys with hardcoded behavior and
buying an IBM TN3270 vs a DEC VT100 made a difference. And NOBODY HAS DONE THAT
FOR 40 YEARS! These devices NO LONGER EXIST (outside of museums), we can STOP
EMULATING THEM ALREADY. Windows never did. The mac doesn't. You have a terminal
with a monospaced font in it and can use bog standard ANSI escape codes (as
implemented in DOS ansi.sys in 1986) and these days EVERYTHING DOES UTF8 ENCODED
UNICODE. (Or should.) But maybe some scripts say "tput clear" instead of "clear"
or "reset"? Then again, those scripts should be easy to adjust shouldn't they?
Sigh, ok "tput cup X Y" becomes the ansi escape code to jump to that location,
fine... wait, they do Y before X? Why? But yeah, I can do a really simple
version of this for some common cases. But HONESTLY. Yeesh. Why is that still in
posix?

fold, cols, csplit, and join are text manipulation commands that take rows of
lines and do things to them. There's a fold in pending, the rest aren't hard.
(Legacy of unix's early history as a typesetting system for the AT&T patent and
licensing department, I expect....) No, "cols" is from something called
"suckless" which contradicts itself in the name, made up lots of random new crap
that didn't take off, and which I last looked at in November 2015. Yanking that...

I've looked at pathchk before and gone "I'm not sure I agree with the premise".
It's in posix, and devuan has it in the standard install, but... why? Define
"portable"? Linux accepts 255 character path components with any char but NUL
and / in them, period. And hasn't got a max length limit otherwise. Portable to
_what_? Why? No, I don't want this one.

stdbuf is sort of a wrapper that intercepts a command's stdin/stdout and does
reads/writes of different sizes at it. I keep meaning to give it a closer look
to see if it's worth bothering.

runcon is also already implemented. It's more selinux nonsense and I haven't got
the selinux libraries installed on my system so the compile time probes make it
drop out, that's why it's in the list. (It's not in _my_ defconfig, but that's a
false negative. :)

getevent is an android thing: on the one hand elliott hasn't removed it from the
android part of the roadmap (which he maintains at this point), on the other
there isn't one in toybox. *shrug*?

zip and unzip are the next logical step after tar (which is in now), but I'd
like to finish gzip deflate-side first.

The "zcmp zdiff zegrep zfgrep zless zmore" family is just "gzip | command" and
there should be some way to genericize it.  Again, optional: you can just "zcat
| grep" yourself, that's all just a convenience...

And that's the list, triaged and explained.

Anyway, taking aaaaaaaal that into account, I think I'm probably somewhere
around 80% of the way towards the 1.0 release? It's not an exact thing, but a
lot of what's left is simple or optional. There's a bunch of stuff there I'm not
sure I _should_ do, and could easily trim them from the list to make a 1.0
release. A lot of other stuff is present in pending, what's in pending works,
and I could in a pinch lower my standards to just accept it. (Not that I'm
likely to, but the amount of cleanup work I need to do on them can vary. There's
a lot more reading code than writing for those, which isn't _easier_ but means
the resulting changes should be smaller.)

The hardest and biggest and most important remaining thing is the shell, which
I'm working on. Beyond that... vi is big but somebody's working on it, bc is
enormous and would take a long time to clean up but I can also trivially patch
it out of the kernel and nothing else uses it anywhere that I've found (you can
thank Peter Anvin for objecting to the removal of perl by adding another
gratuitous build dependency that Linux From Scratch and Gentoo had to add
because they werne't previously building it becuase nothing anywhere used it and
nobody had asked for a 40 year old desk calculator since most systems just run
python and such if you need to get fancy...) The biggest lumps of work left for
ME in the 1.0 roadmap sound like:

rest of the shell
awk
that pile of networking commands and servers
the init/login/mdev/syslogd pile. (Which includes useradd and friends.)
gzip compression side and zip, plus cleaning up xzcat/lzma.
opening the sha3 can of worms (it's currently building them by linking the
crypto code out of openssl but I don't want the dependency).
everything else

Altogether it's a lot less than what I've already done. :)

(Of course "make" isn't in the 1.0 roadmap, that's qcc. Getting android
self-hosting through a minimal binary-auditable native build environment is the
NEXT big can of worms. It would be nice if AOSP pulled the NDK as a build
prerequisite, but they're not there yet...)

Rob


More information about the Toybox mailing list