[Toybox] Shell Compatibility Reports from Oils - ~800 tests passing
Rob Landley
rob at landley.net
Sun Jun 29 21:59:28 PDT 2025
On 6/28/25 21:31, Andy Chu wrote:
> Hi Rob,
>
> The author of brush was also interested in running our spec tests, so
Documentation:
In toybox, you can run all the tests against the $PATH with:
TEST_HOST=1 make tests
Or run a specific one with:
TEST_HOST=1 make sh
So if you want to test a specific implementation of a host command you
can usually go:
mkdir blah
ln -s $(which bash) blah/sh
PATH="$PWD/blah:$PATH" TEST_HOST=1 make sh
And you can VERBOSE=all to tell it not to stop at the first test but
continue (see the comment at the start of scripts/runtest.sh for all
the VERBOSE and DEBUG options).
Here's why documentation is hard to write: the above works for bash, but
NOT mksh or busybox ash.
Unfortunately tests/sh.test is tricksy because debian's sh in the $PATH
points to dash not bash (for STUPID REASONS) so for TEST_HOST it uses
the name "bash" instead of "sh" (see
https://github.com/landley/toybox/blob/master/tests/sh.test#L10 ) and
thus you need to either symlink to blah/bash in that second command line
or add SH=sh to that last command line.
Except of course for the busybox binary distributed by debian if you do
the first method then the very first test fails with:
FAIL: sh prompt and exit
Expected 'E$ '
Got 'bash: applet not found'
Because the way they configured it, it ONLY understands "sh" not "bash".
(Busybox behavior changes based on how you configured it, sometimes
quite dramatically. In toybox, config symbols are _mostly_ just "include
this command or not" and a toybox command should otherwise always behave
the same way modulo maybe version skew between releases. This is a
design decision.)
And if you do the second method you instead get a different failure:
FAIL: sh prompt and exit
Expected 'E$ '
Got ''
Because busybox ash legitimately fails the first test. (I think -i when
stdin isn't a tty confuses it).
So does mksh by the way, but for different reasons:
$ PATH=$PWD/blah:$PATH TEST_HOST=1 make test_sh
scripts/test.sh sh
FAIL: sh prompt and exit
Expected 'E$ '
Got 'E:bash: --: unknown option'
make: *** [.singlemake:1313: test_sh] Error 1
And the FUN part is that if you VERBOSE=all to tell it to run all tests
rather than stopping at the first one, half the mksh tests fail because
it still doesn't understand --longopts which is part of the standard
shell setup stanza in sh.test:
# insulate shell child process to get predictable results
SS="env -i PATH=${PATH at Q} PS1='\\$ ' $SH --noediting --noprofile --norc -is"
Because reading random rc files to change the behavior to who knows what
is bad (tests need to be at least SOMEWHAT portable), and gnu readline
will reach out and interact with /dev/tty when run in interactive mode,
ignoring stdin/stderr which is what "expect" is trying to feed it input
through and read the output of. (The $ prompt goes to stderr, not
stdout. I was surprised too.)
So mksh always spams a warning to stderr and half of the tests are
looking at stderr.
Meanwhile, if you use VERBOSE=all busybox ash manages to SIGSTOP its
terminal on that first failure when it DIDN'T GRAB THE TTY (-i when
stdin isn't a tty is buggy, no I haven't asked Denys about it yet), so
you have to run "fg" repeatedly to resume the script and continue to the
next test.
There's a reason I've mostly just been comparing toysh against bash. I
don't add tests bash won't pass.
> I went ahead and created the "Bashix" repo with discussions and wiki
> pages:
>
> https://github.com/bashix-spec/bashix/discussions
>
> - How to Run Oils spec tests
> - Idea for a common test format
> - a simple way to get started, by pooling some effort
>
> https://github.com/bashix-spec/bashix/wiki/List-of-Shell-Test-Suites
Let's see... https://github.com/bashix-spec/bashix/discussions/2 says:
> Right now it is sorta coupled to the Oils contributing process,
> which dozens of people have done. It should take 5-10 minutes I think
And then links to
https://github.com/oils-for-unix/oils/wiki/Contributing which starts
"Please use Linux to make changes to Oils and test them" then and then
"Please clone oils-for-unix/oils (rather than creating a fork)" and then
"Build Two Complete Oils Interpreters: in Python and in C++" and that's
where I stopped reading and put it back on the todo list.
You do not have to build toybox to use the toybox test suite. I actually
have a packaging script for mksh that cherry picks the test suite out
and leaves the rest of the soruce code behind, although it looks like
all I checked IN was
https://github.com/landley/toybox/blob/master/mkroot/packages/tests
which is like half of it. (Well mkroot can't RUN it yet, because toysh
barfed on setting the "trap" on line 7 of scripts/test.sh last I tried,
I hadn't implemented trap yet. I have now done so, and should circle
back around, but I need to get a release out...)
>> I'm usually interested in more tests. And I never claimed 100% bash
>> conformance: even BASH doesn't have 100% bash conformance. (Half my
>> arguments with Chet _cause_ version skew, which I see as a bug and he
>> sees as a feature. I ask him to explain a corner case and he FIXES it,
>> but then what do I put in the test suite if I want to pass TEST_HOST on
>
> I agree with this problem -- I've hit it many times. I added this
> quote to the Bashix README.
>
> I notice that if you ask a question on the bash mailing list about
> bash behavior, sometimes they will just read you the source code.
>
> Meaning they didn't think about that case before you asked.
I often don't think about a case before someone asks about something,
that's life. But you'll notice I don't link to code without an
explanation, because I assume you can dig into the source and read it
yourself if you were sufficiently motivated.
>>> But then I learned there are TWO shells in Rust aiming at bash
>>> compatibility, both started in 2022 - in this thread
>>> https://news.ycombinator.com/item?id=43908368
>>
>> Yup, there you go. Two of them.
>
> At first I thought these projects overlapped a lot with OSH, but when
> I look at the home page
>
> https://brush.sh/
>
> https://github.com/shellgei/rusty_bash/
>
> they are focused on the INTERACTIVE use case. OSH is both an
> interactive shell and a language, like toysh.
They're focused on knocking on people's doors and telling them about
their lord and savior rust. Everything else is a side effect.
(Unfortunately the FSF is like that too, except about licensing. Took me
about 10 years to figure that out, and I'm still sad about it.)
> But I discovered many years ago that interactive scripts can almost be
> considered a different dialect of bash. And there are many features
> that are interactive only -- e.g. the completion and history APIs,
> bind -x, etc. which are tightly coupled to GNU readline.
Busybox doesn't pull in gnu/readline and I don't plan to either.
Toybox's stance on dependencies is explained in three different parts of
https://landley.net/toybox/design.html and I also went over it in
https://www.youtube.com/watch?v=Sk9TatW9ino (and the "trusting trust"
rant I waved at you earlier), but none are concise about it. Maybe I
need a FAQ entry?
Hmmm, the "shared libraries" part of design.html gives the gist of it. I
don't have an anchor tag for it, but go to
https://landley.net/toybox/design.html#license and scroll up a little,
it's right above that.
(Pascal's apology for writing a long letter because he didn't have time
to write a short one applies to most things I do.)
> So I think it may help to view these projects as slightly different,
> as complementary. There is a lot of work to go around.
>
> I would certainly understand if the authors didn't think say toysh
> could be finished, and are trying their own approach. That is valid
I've had a somewhat stressful decade. I honestly thought I'd be done by
now, but I haven't had the focus I anticipated.
>>> So I thought you may be interested in this.
>>
>> Not really. I've never met a rust developer who had any argument for
>> rust other than "I hate C" and "writing more C is a SIN you HEATHEN".
>
> Although memory safety isn't the #1 problem with shells (e.g.
> ShellShock was not a memory safety bug), I think it's a good reason.
Oh sure, that's why I keep messing with ASAN (and why Android builds
everything with hwasan on arm).
Heck, even tinycc had a bounds checker. "Electric fence" was 1990s,
valgrind was 2002...
> (And btw I don't use Rust myself. I don't like long compile times.)
That's a C++ problem, not C. I did a long three part explanation of what
I think is wrong with C++ many moons ago:
https://landley.net/notes-2011.html#16-03-2011
https://landley.net/notes-2011.html#19-03-2011
https://landley.net/notes-2011.html#20-03-2011
> FWIW I contributed some memory safety fixes to toybox over 9 years ago:
>
> http://lists.landley.net/pipermail/toybox-landley.net/2016-March/016106.html
>
> http://lists.landley.net/pipermail/toybox-landley.net/2016-March/016178.html
>
> http://lists.landley.net/pipermail/toybox-landley.net/2016-March/016179.html
I thought your name sounded familiar, but assumed it was from
linux-kernel. :)
> I felt then that shell requires a ton of parsing, and parsing in C
> inevitably leads to memory safety bugs.
Eh, not inevitably. And if you're worried about that, why not use a
scripting langauge? (As described in the above three part rant against
C++...)
If Rust wants to go off and write its own 4 packages (kernel, compiler,
command line, and the big system library tying them all together) it is
100% welcome to do so. Good luck to them.
But they keep trying to hijack C projects because they insist they are
OWED those C projects. I don't care about new rust implementations, I'm
annoyed when they fsck up existing projects that built fine without rust
until now.
> In the last 9 years, the
> world has moved in the direction that this isn't acceptable, so I'm
> glad I didn't write Oils in C or C++.
>
> (It actually started as 3000 lines of C++ in March 2016, but I
> abandoned that approach.)
I looked at rewriting busybox in lua somewhere around 2009, but
"eliminating external dependencies" fought back hard. LUA had a hard
requirement of being extended in C. It had no libc eqivalent! Java had
TWO standard libraries (one for applet, one for application), python had
a standard library FOREST (with wrappers for zlib and friends in
there)... LUA had a stub you'd never implement wget with, let alone
ifconfig. (I think I even needed an additional library to write "ls" or
"find". It's been a while, I forget the details but not the disappointment.)
And if I had to write C anyway (and thus cross compile to every
supported target)... just do it all in C? Lua became the unnecessary
dependency to eliminate. Which was sad, it's an elegant little language.
>> Which doesn't explain why to use _that_ language for kernels instead of
>> instead of go, swift, zig... And that's kernels, why not do userspace in
>> any of the bytecode languages? (I personally like Lua.)
>
> Although Lua is faster than Python, it's not fast enough to implement
> a shell in.
Sure it is. Easily.
Hades and Hades II are written in Lua. World of Warcraft was mostly Lua
under the covers. Heck, Baldur's Gate and Neverwinter Nights back in the
1990s were Lua. (I mean they're all "3d engine with LUA puppeting the
models.) There's a FLOOD of Lua developers who learned it from Roblox.
In 1998 I was doing animated interactive graphics programs in Java
without a JIT, on a 486-DX75 getting about 15 frames/second. The trick
was to allocate all your objects up front and hang on to them so the
garbage collector never introduced a latency spike, I had a fix for that
(ala https://landley.net/notes-2018.html#18-06-2018 ) but never got
around to doing anything with the idea. Oh well...
Anyway, the problem wasn't performance, it was dependencies. If
http://lua-users.org/lists/lua-l/2006-12/msg00522.html had caught on Lua
could have taken over, but alas...
> Oils is written in Python-based DSLs, with a bunch of metaprogramming.
> It uses very fine-grained static types (in the style of algebraic data
> types, checked with MyPy).
The switch from python 2 to python 3... cooled my enthusiasm for the
language.
https://landley.net/notes-2024.html#09-04-2024
When "python" stopped being in the $PATH, I stopped using it. (Well, I
still build 2.7 for some old scripts, but only because they haven't been
worth re-writing in something else. The
https://landley.net/toybox/status.html page is generated by an old
python 2 script, but effort's better put into finishing commands than
tweaking the unfinished category indexer.)
Meanwhile, they're JUST NOW removing support for K&R C circa 1975. I
moved from C99 to C11 fairly recently because it had a couple features I
found useful (__has_include(), typecasting up structs and array
instances as function arguments, and the improved initialization syntax
for the same). I have have no immediate plans to move further. 2011 was
14 years ago, but even C99 is unlikely to stop being supported any time
soon. No real reason to.
> That is good for concentrating on ALGORITHMS, which I felt was
> important for long-term project with a clean codebase.
>
> But it's bad for speed.
Oh python is slow, sure. I gave up on trying to anything performant in
python in 2002 (when I wrote my own md5sum implementation in it and
could only get about 300k/second throughput).
> So we wrote a Python-to-C++ translator, and a small runtime with
> garbage-collected data structures. This took a LONG time, but is now
> done.
Why C++ instead of C?
> And OSH is now FASTER than bash, in both I/O and computation.
Who was it who said premature optimization is the square root of evil?
I haven't really been doing much performance optimization pre-1.0. Well,
Elliott started doing some stdout buffer changes to speed up the AOSP
build, and after a bit of whack-a-mole because those semantic changes
broke stuff, I did https://github.com/landley/toybox/commit/d3cef27b10ec
one night because "if performance is going to be a thing"...
And there were some specific performance fixes for the AOSP build in sed:
https://github.com/landley/toybox/commit/3354319e3d3e
https://github.com/landley/toybox/commit/007af3537d18
And the auto-fixed and bucket sort stuff for grep:
https://github.com/landley/toybox/commit/a7e49c3c7860
https://github.com/landley/toybox/commit/2611693169c0
https://github.com/landley/toybox/commit/193009855266
But that's all "wait for somebody complain, then improve the identified
bottlenecks".
> e.g. it runs autotools configure scripts in 90-95% the elapsed time of
> bash or dash:
>
> https://oils.pub/release/latest/benchmarks.wwz/osh-runtime/
"Autoconf is useless" can be sung to "every sperm is sacred".
You saw the bit with 20% performance improvement just from static
linking? Using builtin commands without going through $PATH/fork/exec is
quite a significant speedup that I haven't leaned very hard into and
which Android's build disables entirely in the .config.
*shrug* The first question about optmizing is always "for _what_".
> And it runs computations like "Fibonacci" faster than bash.
>
> Actually this was confirmed today by an independent user -
> https://news.ycombinator.com/item?id=44407626
Cool. Do you calculate a lot of fibonacci sequences in shell script?
The reason Ubuntu switched #!/bin/sh to point to the Defective Annoying
SHell was to speed up their init scripts. No really, explicitly:
https://wiki.ubuntu.com/DashAsBinSh
And then when that didn't work they wrote Upstart to parallelize them,
copying the work of some IBM guys who'd used "make -j" as their init to
bring up the system in parallel (there was a linux weekly news article
about it).
Of course in changing /bin/sh Ubuntu 6.06 broke the kernel build (dash
was full of bugs at the time, plus since bash was literally the first
program Linux ever ran bash syntax was everywhere so it broke LOTS of
scripts), But they never pointed /bin/sh BACK to bash because white
billionaires from south africa seem categorically reluctant to admit
ever having made a mistake.
>> Bash exists. I'm trying to do a self-contained project so I can get a
>> minimal system down to 4 packages (compiler, kernel, libc, cmdline) so
>> you can build something like tinycc+linux+musl+toybox and then build
>> Linux From Scratch under the result.
>
> Yes, I liked your Aboriginal project! And I think bootstrapping is
> important for a shell.
>
> Rust code is not really bootstrappable, and there's only one
> implementation of the Rust compiler.
They don't have a rust kernel. They're adding ring zero domain crossings
to Linux in the name of "safety", so in order to understand what it's
doing you need intimate knowledge of MULTIPLE languages. They're adding
extra wrapper code to let rust talk to the C rbtree infrstructure. This
is their technical judgement on what counts as "safety": more glue code,
more domain crossings, fewer people who understand the whole data flow,
much safety, wow.
(Did you ever read the Tanenbaum/Torvalds debates, where a microkernel
was somehow supposed to protect you from runaway hardware DMA, a
screaming interrupt softlocking the processor, inappropriate power
sequencing...)
> The OSH tarball is small -- you get about ~115K lines of standard C++
> 11 (with absolutely no dependence on Python)
>
> https://oils.pub/osh.html
>
> So all you need is a C++ compiler, a shell, and ~30 seconds to use it.
> And you get a 2 MB executable, which also has YSH.
I'm trying to bootstrap a system in C, not C++.
I remember a contract I worked at a company that had shipped a library
where they did "extern C" around a function returning a pointer to a C++
object, and were SURPRISED to discover that they'd bound themselves to
the old C++ ABI (we SAID extern C, that means we can pass a pointer to a
class instance, right?) when gcc had some flag day switch, and thus they
could only build compatible binaries with a version of Red Hat
Enterprise that was going out of LTS. And they'd done it in such a way
that their customers made binary plugins that interacted with the thing,
so everyone everywhere would have to rebuild at the same time and it was
a violation of their support contracts.
This was back before the Intel Itanium C++ standard was adopted as
"nobody else has written something up in this much detail, so it doesn't
matter how stupid it is", and thus the common trick of reversing the
order of class member identifiers in deep hierarchies (so strcmp
deviated faster rather than traversing nearly identical
thing.thingy.also.whatever.potato... until it hit the unique bits at the
end) had to be undone in all the other architectures because what Intel
had documented was STUPID but was now STANDARD...
I didn't say I haven't DONE a lot of C++. I said I don't LIKE it.
Aboriginal was building uclibc++ for a while. (Its author Garrett
Kajmowicz sat at the desk next to mine at Timesys for a year, we still
say hi a couple times a year. He's the one who explained to me how C++
is turing complete at compile time and had the two line example that
would take longer than the sun has left to compile. In actuality it
filled up its ELF segments and the compiler aborted with an internal
error.) Yes, I've had to look up the syntax for getting a pointer to a
member function MORE THAN ONCE.
If you're wondering why toybox uses "new", "try", "throw", and "catch"
as local variable names so much, and any time I have to typecast a
pointer I just (void *) it and let the compiler sort it out... One of
the main global variables is "this". Toybox ain't gonna compile as C++
any time soon. More than one static checker that thought C was C++ got
confused and went off into a weird little tailspin when run on toybox.
> YSH is more like Python -- it has garbage-collected data structures and JSON.
>
> https://oils.pub/ysh.html
Not really a fan of json either. There are far worse file formats, but
it's textual without being human friendly. (More so than docbook!)
The last three programs I used that have produced "json" output have
done so in a way that looking at it in "less" was basically useless.
(Zero line breaks. Can't use grep either. I had to cut it up with "sed"
and INSERT line breaks, and parsing json with sed? Not trivial.)
> My interest was more in moving PAST bash. Like you say, "bash exists".
Fish exists, zsh exists, ysh exists...
Way back when there was bourne shell, korn shell, C shell, and a
half-dozen others. Bash was the first program Linux ran. Linus turned
his terminal program into a kernel specifically so it could run bash.
(He wrote the term program because minix's microkernel design couldn't
keep up with a 2400 baud modem. He made it read/write the minix
filesystem so he could download stuff from usenet. He made it run bash
so he didn't have to reboot into minix to mkdir/mv/rm files around on
his tiny hard drive when downloading more stuff from the university's
microvax. At that point, he was 95% of the way to getting it to run gcc.)
Until dash, bash was THE linux shell. That's why it's interesting.
> I think there is value in having another bash implementation -- OSH
> has precise error messages and so forth. But really bash is a bad
> language that we should move on from -- that is the point of YSH.
The same way qwerty is a bad keyboard layout and thus dvorak was inevitable?
> Obviously bash will exist for decades to come, but I think it's
> important to have a path out of it.
https://xkcd.com/927/
> I don't think having 4 different "sorta bash" projects helps anybody,
> and that is the idea behind "Bashix" -- to SHARE more of the work. At
> least the tests.
Yay tests. I like tests. I sometimes get tests I decided not to pass
though. Or to pass a watered down version of.
Just yesterday, https://github.com/landley/toybox/commit/07a422c55901
for example. The tests contributed to toybox didn't pass TEST_HOST,
because the error messages can vary even without internationalization.
> ----
>
> I wonder if OSH does everything you want ?
Which of the 4 base packages is it?
> I think it would be easier to say write a C++ to C translator, or a
> Python to C translator, than to finish toysh.
There already is a C++ to C translator:
https://github.com/JuliaHubOSS/llvm-cbe
My vague bootstrapping plans involve something like that (probably
running llvm itself through it) to add C++ the same way "building perl"
adds perl.
Inspecting the entire C++ compiler source to not have trusting trust
issue I leave as an exercise for the reader, personally I'd just avoid
ever using C++ for anything but that's not my call. But at least you
don't have to inspect the BINARY for trusting trust issues. (Modulo the
C++ converter could itself be suborned, and checking its generated
output can't be fun. But that's C++ for you.)
> Yes toysh has a very small binary, but it's also the least compatible
> shell, and I thought you said you've worked on it since ~2007 or so.
I started working on a new shell at the tail end of my busybox tenure:
https://lists.uclibc.org/pipermail/busybox/2006-September/058418.html
Shortly before Bruce happened:
https://lwn.net/Articles/202106/
I left busybox and started toybox in 2006, and was busy for the next few
years A) working on aboriginal linux, B) redoing toybox versions of the
commands I'd already finished in busybox (since I knew what was entirely
my code, and how to write a new one if I needed to). Toybox kind of
rolled to a halt around 2009 because "busybox exists" and there was a
certain amount of "meh" ala
https://landley.net/notes-2009.html#07-08-2009 but I perked up again in
2011 when Tim Bird suggested a new use case for it (non-gpl licensing
domain) that along with "I can do a better job starting fresh" combined
to give it a reason to exist.
A quick redo of basically the bbsh code I'd already done was in toysh
from the beginning, but I didn't look at it at all until I restarted
work (deleting the old one and just doing a fresh one) in 2019.
By that point, I'd ridden down another bankrupt startup (something I
swore I'd never do again, and yet) and acquired some health issues from
the stress of senile Boomers electing a local New Jersey con man who'd
bankrupted multiple casinos in Atlantic City. (My family moved to New
Jersey when I was 10 and I spent 13 years there through no fault of my
own before I could escape: that clown was a local who was REGULARLY
MOCKED. Back in the 90's bankruptcy was part of his money laundering for
the Russian Mob, the regulators and tax authorities didn't look at
LOSSES so oligarch du jour would "loan" the launderer a pile of money,
who would spend it on the oligarch's other businesses and then declare
bankruptcy to bury the paper trail. That's why he could go bankrupt a
half-dozen times but a friendly oligarch was always there to "loan" him
the next round. Real estate transactions were another common way to
launder money: sell a million dollar condo to X to get cash, buy another
one from X's other business to give back the cash, pocket the
difference. Rinse repeat in the other direction with the SAME PROPERTIES
sometimes, if you do it right there's no tax because you're
buying/selling at the same cost basis so made no capital gains. He did
that stuff ever since he finished spending his senile father's estate
after making himself the executor so he could siphon off all the money
from everyone else the will named.) Then as president that idiot turned
a disease into a pandemic (countries that tested didn't have lockdowns,
he was telling people to inject bleach and snort horse paste and somehow
made NOT masking a culture war thing), with incoming shipping
countainers piling up 8 high at the port of los angeles and empties not
being returned so of COURSE there were 1974 oil shock style supply
constraints leading to inflation (after TEN YEARS of everybody TRYING to
create inflation and being stuck at the zero lower bound, how do you
screw up THAT BADLY). Anyway, I had to return from Japan back to Texas
because of all that, and couldn't get back to Japan before my residence
card expired. Then the Senile Boomers voted for an even OLDER guy to
inherit that mess (the oldest president in history, and his speaker of
the house was 80 at the start of the term) and I kept rememering how
Hindenberg signed the Reichstag Decree when he was 85 and DREADING what
was coming because they spent 4 years NOT CONVICTING HIM after an
outright insurrection. When my wife defended her doctoral thesis and got
a job here in Minneapolis (I couldn't convince her to move out of the
country) I sold the house in Austin and moved in with her (packing,
moving, and selling a house is EXHAUSTING) and basically spent the next
year after that curled up into a ball hoping I WASN'T right...
I've had a shortage of spoons recently. Sorry about that.
> https://pages.oils.pub/spec-compat/2025-06-26/renamed-tmp/binary-sizes.txt
>
> OSH has been funded by https://nlnet.nl since 2022, and for the next
> 6-12 months, we are going to make a push towards showing you can build
> a distro like Alpine Linux with OSH.
Good luck.
> We need some more people to participate in that effort, e.g. to reduce
> bugs into isolated test cases.
>
> So if anyone is interested, you can be paid for that work!
>
> ---
>
> It's fine to work on code "for no reason".
I have reasons. And I have been paid to work on it, on and off. (I
really need to recover my patreon login. Some stuff's still packed from
the move...)
> But if you're going to say
> the Rust projects have no reason for existing,
No, I just think their reasons are bad.
They hate C++, and think that C is the same thing as C++ because that's
been C++'s entire marketing strategy since 1986. (This language contains
the whole of C and thus just as good a way to write programs, the same
way this mud pie contains an entire glass of water and is thus just as
good a beverage. We've added even MORE simplicity this release! Piled
layers and layers of it on top of each other! We're simpler than ever!
Next time we'll have TWICE the simplicity, you won't even be able to
read it all!)
That said, if they want to write a new system in Rust and make a go of
it, good luck to them. I've never heard a package writen in Zig or Swift
_trumpet_ that it's written in that language, but when something's
written in rust they always claim its main advantage is being in rust.
I strongly suspected your shell wasn't written in rust because you were
3-4 emails in before you mentioned what language it was written in. No
rust project would EVER wait that long.
> then what's the reason
> for toysh existing?
Our story so far: https://landley.net/aboriginal/history.html
What and why is toybox: https://landley.net/toybox/about.html
The design of toybox: https://landley.net/toybox/design.html
2013 talk: http://www.youtube.com/watch?v=SGmtP5Lg_t0
2017 jetlagged ramble: https://www.youtube.com/watch?v=Sk9TatW9ino
2019 talk: https://www.youtube.com/watch?v=MkJkyMuBm3g#t=1m18s
Source code walkthrough: https://landley.net/toybox/code.html
I have TRIED to answer that question. The "trusting trust" thing is also
an attempt to explain what I'm trying to accomplish.
To be honest, most of https://landley.net/aboriginal/about.html still
applies but "4 projects instead of 7", "moving off 'the GPL' now that
there no longer such a thing", "I can do a cleaner job in the new
codebase" (although after
https://lists.busybox.net/pipermail/busybox/2010-March/071783.html and
my meeting with Denys at CELF the following month he DID start to
implement some toybox infrastructure in busybox). Plus "gcc has
metastasized from 2 to like 7 packages now, dude..." And, of course, I
figured out how to do most of what it was doing in a single 500 line
bash script, so throwing out the old infrastructure and starting over
made sense.
> If you take into account what I just mentioned:
>
> - That OSH is by far the most bash-compatible shell
No, bash is the most bash-compatible shell.
> - It's bootstrappable, requiring only a C++ compiler
Last I checked bash was still C, not C++. So in theory I could build it
with something like tinycc or https://landley.net/qcc/ or
https://github.com/PortableCC/pcc or
https://github.com/EtchedPixels/Fuzix-Compiler-Kit or
https://github.com/libfirm/cparser or zigcc or...
I am focusing on one of the four base packages. Rich Felker's done a
reasonable job with musl, although there are a couple others (and if you
squint, maybe the kernel's nolibc is worth a look). I maintained my own
tinycc fork for a few years but didn't have the bandwidth to do that AND
aboriginal linux AND toybox (and these days I wouldn't do qcc=tcc+tcg,
I'd start over from scratch).
There are a number of potential kernels. Alan Cox has is own of course,
and I'm sad that tilck's build system was insane, seemed quite a good
start otherwise:
https://github.com/vvaltchev/tilck
https://www.youtube.com/watch?v=Ce1pMlZO_mI
What was the new one I saw last week...
https://en.wikipedia.org/wiki/ToaruOS
> - It's faster than bash
Older versions of bash were faster than current versions of bash.
> - It has less source code than bash (~64K lines vs. 162K lines)
$ wc -l toys/*/sh.c
5113 toys/pending/sh.c
(Which is too big, I'd like to slim it down later.)
$ cat main.c lib/*.c toys/*/*.c | wc -l
79832
So the whole of toybox is only 25% larger than your shell.
> - It's memory safe
>
> Then I think toysh looks less appealing.
So you're explicitly trying to talk me out of working on my project, and
into working on yours.
Not interested.
> If you are not interesting in USING OSH,
Not anymore, no.
> then I would say it's still
> beneficial to just COPY the algorithms into C. When I look at the
> blog, e.g. about alias, there is a lot "language reverse engineering"
> that we've ALREADY done, that you are repeating.
That Chet Ramey Already did, and which the busybox guys presumably did
for ash.
I'm trying to figure out what bash is supposed to do. Reading the man
page didn't explain in sufficient detail.
Reading your code would show me what YOU did, not what the right thing
to do is. And for me, it's not always faster than just THINKING about
the problem. What SHOULD it do in this case? (This isn't the case for
things like xz, where there's a specific algorithm I'm reverse
engineering. But when there's a problem to be solved, I want to
understand the PROBLEM not just somebody else's solution to it.)
And reading someone else's code when I'm producing 0BSD code (a public
domain equivalent license, whole 'nother can of worms) is generally not
a good idea,
> As another example, Koichi Murase is a contributor to OSH, and he
> wrote the biggest shell program in the world (ble.sh), and overhauled
> our array implementation based on his extensive knowledge:
>
> https://oils.pub/blog/2025/05/release-0.27.0.html#complete-overhaul-of-bash-arrays-koichi-murase
Ooh, cool. https://github.com/akinomyoga/ble.sh
Good to know.
I haven't started on bash array support yet because it's not a feature I
ever used in my own shell programming, but I have a bunch of TODOs about
it. Mostly it's genericizing the "$@" plumbing (except there are also
associative arrays), but there's some syntax parsing areas I've stubbed
out. Needs a whole second pass. And some builtin magics are arrays that
have to auto-update...
> At the current rate, it seems like it will be years, if not a DECADE,
> before you can even start thinking about that stuff.
If the orange idiot gets a third term I doubt it will happen at all.
> ---
>
> So I think "Bashix" could be about sharing some more work.
Your recruitment pitch talked me out of it. I was ok collaborating on
test suites, but "stop work on your useless project, you've already
failed, come work on my superior one" kinda killed my interest.
> My first
> reaction is that "Bashix" should just be OSH, because we wrote it as
> an "executable spec".
Do you know why reading code is harder than writing code? Because when
you're writing, the code on the screen trails the mental model you've
constructed so everything makes sense and you know why you did it. When
you're reading code, you're randomly seeking around assembling an
incomplete mental model from a choose your own adventure book where
everything is the edges of a jigsaw puzzle and you have to hunt down
each new piece and THEN try to figure out why from what.
That's why newbie programmers always think what they wrote is superior
BECAUSE they wrote it, and thus it makes more sense to them. Every
programmer has a bit of that, it's something you have to constantly
correct for.
I'm trying to remember where I first read that, years ago. I thought it
was in https://www.oreilly.com/openbook/opensources/book/tiemans.html
but apparently not. (Might have been Robert Young's "Under the Radar"? I
don't THINK it was Dennis Ritchie's website...)
> But there are a few places where that is not true -- the reports show
> that sush and brush pass some tests that OSH doesn't:
>
> https://pages.oils.pub/spec-compat/2025-06-26/renamed-tmp/spec/compat/DELTA-osh.html#t:DELTA-osh=9d
>
> So we are going to work on that.
Good luck.
> toysh passes 5 tests that OSH doesn't -
> https://pages.oils.pub/spec-compat/2025-06-26/renamed-tmp/spec/compat/DELTA-osh.html#t:DELTA-osh=7d
>
> Some of those are "intentional strictness" that is trivial to relax.
> e.g. we disallowed 'FOO=bar break' on purpose, but it's easy to allow
> it.
I dunno why bash does stuff, and every time I ask Chet there's the
danger of him changing it.
Also, I tend to ask "why" questions and he answers with "what", and it's
easy to talk past each other if I don't spot that quickly.
>> You may have seen the long threads on here with Chet Ramey, the bash
>> maintainer. I've also had discussions about standards with Elliott
>> Hughes the ANdroid base OS maintainer.
>>
>> Alas, neither of us wants to maintain a standard because it's a TON of work.
>
> Yeah I think the "default outcome" is that there are going to be 4
> "sort of bash" projects, which is not going to do users any favors.
Unix was first published in 1974. Posix started in 1988, 14 years later.
It's easier to spot the compatible subset when you've got multiple
interacting implementations over a long period of time.
This was the theory behind the IETF "bakeoffs", and the rule for a while
that no standard was real until there were two unrelated interoperating
implementations.
https://datatracker.ietf.org/doc/html/rfc1025
I'm trying to make simple and understandable versions that run in low
resources (up to and including NOMMU embedded systems) and are easily
read and audited especially by students new to programming. It sounds
like you're generating C++ from Python 3.
> Because there will be a lack of coordination, and sharing of labor.
Welcome to the internet.
> But at least if you want to run Oils tests, the "Bashix" forum is now
> the place to discuss that.
>
> OSH is not running upstream bash tests yet, so I hope to import those.
> And the sush project is running the bash-completion test suite.
>
> So it probably makes sense to at least share the effort of TESTING the
> shells, if not the implementation.
I have "make tests" using the same infrastructure to test every toybox
command.
You're saying I should also run a second set of tests just on the shell.
Which I might do, but I'm pretty sure bash's OWN tests would come first?
https://cgit.git.savannah.gnu.org/cgit/bash.git/tree/tests?h=devel
> And using the same test suite makes it easier to do comparisons.
You could push tests upstream to Chet. You're not doing that.
>> I'm usually interested in more tests. And I never claimed 100% bash
>> conformance: even BASH doesn't have 100% bash conformance. (Half my
>
> At first, I thought that OSH could be a "subset of bash".
>
> But that is pretty much useless to users like Alpine Linux, the Nix distro, etc.
I implement stuff and wait for people to complain, then fix it when a
real user shows up, meaning they have a real test and can explain their
use case.
That's how I've handled every command.
> They're not interested in writing in a subset -- they're interested in
> running the big mess that they ALREADY have.
People are interested in running their existing scripts, yes. That's how
Aboriginal Linux extended busybox to build Linux From Scratch, I had to
teach sed --version to say "This is not gnu sed 9.0" because fscing
autoconf was explicitly testing for "gnu sed %d.%d" and requiring a
minimum version, and otherwise producing different build output. It
wasn't the right behavior, but it was the needed behavior.
I was confirming line by line of output that autoconf announced the same
decisions and the compiler was being called with the same command line
flags, and tracking down each deviation to figure out WHY and if at all
possible fix it. Diff the build with the gnu tool vs the build with the
busybox tool for any change I could spot, one tool at a time.
> The people maintaining the distros now did NOT write the bash scripts!
> So they don't even have the knowledge to fix the bash scripts, if
> they wanted to.
Yes, see the link on loss of institutional knowledge in previous email.
(It's a GREAT story. Posted anonymously an some pastebin site, it made
the rounds virally, then got forgotten. Which may actually be ironic,
it's hard to tell in a post Alanis Morisette universe.)
> ---
>
> Anyway, that's long enough for now. I hope you'll take some time kick
> the tires on OSH, and see if it does what you want toysh to do.
Bash already does what I want toysh to do. Just not how I want it to do it.
> And to participate in the Bashix discussion! At the very least, you
> can get some more tests out of it.
I have no shortage of tests. There's TODO notes in the source, multiple
text files of notes to self, the $BROKEN tests already in the test suite...
$ BROKEN= VERBOSE=all make test_sh | grep FAIL | wc -l
...
67
I have some existing users (yes of toysh), who poke me when they hit a
thing, and I fix that.
I'm knocking out existing todo items for things like running toybox's
own scripts/make.sh and scripts/install.sh and scripts/mkroot.sh and
scripts/mcm/buildall.sh which calls musl-cross-make. There's a BUNCH of
those. Those are "immediately at hand" tests. (Elliott sends me patches
whenever I use a bash feature that mksh doesn't have, and I apply them
for him but keep them here as todo items to put BACK when I can run the
toybox test suite with toysh, under mkroot. I need to be able to do that
to run root tests in a known environment, ala testing insmod and mount
and so on.)
I also have a dozen or so old dirty tree forks I've set aside and
restarted clean (usually when I need to put out a release), and I trawl
through the diff of them to find things to fish out and finish.
When I get tired of that, I run "help" in bash and look through the
command list, and pick on that looks manageable.
When I get tired of that, I read through the bash man page line by line
(from the beginning, yet again) until I hit "oh, I need THAT" and then I
go off and work it into the design.
In theory before 1.0 I need to re-read what posix has to say about
shells to make sure I didn't miss anything (I think my last full rules
lawyering while taking notes start to finish was when SUSv3 came out,
and they just dropped SUSv5), but I point out that dash is proudly and
fully posix complaint so it CAN'T mean much.
> Although honestly there are already more tests than we have time to
> fix (the bottleneck is CONTRIBUTORS, not tests), I think the common
> tests are useful because they enable comparisons like the one I just
> sent.
I treat tests as todo items. Gotta catch 'em all, but I can't have them
all immediately in my party.
> My hope is that some shell authors will see that they are duplicating
> a huge amount of work that OSH has already done. (And that OSH is
> BOTH bootstrappable, and memory safe.)
Why would they?
You want people who didn't join bash, didn't join busybox ash, didn't
join mksh, and didn't join toysh, to join your project so that your
project can become the One True Project. So far, you've got people doing
stuff in rust to use your test suite.
Ok.
Toybox has always interoperated. I started toybox development by
replacing busybox commands in aboriginal's build $PATH one at a time,
just as I made firmware linux (aboriginal's predecessor) by replacing
gnu commands one by one in the path with their busybox equivalent and
fixing what broke. (The "airlock" came from LFS 3.0 chapter 5, they
called it the /tools directory. It was to avoid leaking dependencies
between contexts.)
Having another implementation of everything out there isn't a threat,
it's reality. Look at the roadmap, I compared a dozen implementations
when I STARTED. I was comparing each new command I did against the
behavior of the existing debian command and the existing busybox
command, and Android was using the BSD command for a lot of them.
Every toybox command can be switched off, so you can run an alternative
implementation. (Mostly you could just put the other one first in the
$PATH even if you didn't switch it off.) Interoperating with other
implementations is a central design idea, but providing a complete base
capable of building itself under itself that does not REQUIRE
supplementation is also a central design idea. (As described in the
roadmap.)
For example I'm not implementing dropbear, because dropbear exists and
I'm happy to have Matt do it. It's out of scope for toybox because it's
optional for a build/development container. (It's really NICE, but my
mkroot systems running under qemu provide a console via an emulated
serial port. You can have a build container that does NOT have ssh, and
if you need to edit a file in there you could ssh to the host and lxrun
a shell in the container from there.)
But a shell is a command line utility necessary for a self-bootstrapping
system: even cmake and such are calling command line utilities using
shell wildcard expansion. It's not a separate category like toolchain
binaries (targeting different hardware architectures, so the MacOS set
outputting mach-o binaries are inherently different than the Linux set
outputting ELF binaries and one doing both would fundamentally have code
that's not used on a given architecture). It's as generic a tool as sed
or ls.
Busybox has always had a shell. Red hat's NASH started as a shell with
built-in commands glued to it. (Busybox started life as debian's boot
disk uitlity, Nash was Red Hat's. Red hat dropped nash for busybox years
ago.) It's not "oh we can't compete", it's "the judgement of what this
sort of general utility package needs is unanimous that 'shell' is part
of the set". As says posix. You need a shell to run the other command
line utilities. That's why toybox needs a shell.
On the external dependency front: toybox builds with C11 and libc, and
that's it. My build doesn't even require python (the one remaining
scripts/mkstatus.py command just generates the status web page). You
should be able to build mkroot and then rebuilt it UNDER mkroot, and if
it needed python to do that toybox would have to IMPLEMENT PYTHON. (Or
suck it in as part of the toolchain, which would be wildly
unconstrained. Right now the TOOLCHAIN= binaries for building a patched
kernel (removing bc and gcc) are just cc, ld, as, and objdump. (Because
the kernel needs objdump for some reason, I forget what.)
The four packages making up the circular dependencies of the base OS are
conceptually distinct:
The kernel is obviously its own thing: bare metal, ring 0, not just
manages contention and allocates resources but creates abstractions like
"filesystems" and "processes" from block devices and a CPU with a timer
interrupt...
The C library is an interface layer translating between ALL THREE of the
other packages. It makes kernel syscalls with kernel structures, takes
calls from userspace calls and with (often DIFFERENT) structures
translated to/from the kernel ones. And it provides a bunch of standard
functions like printf() that are generic-ish but which toybox would need
to provide as a library for OTHER things to link against if toybox
provided them (and we're not a library). And it's all sprayed down with
a lot of thread locking nonsense I don't want to touch (toybox isn't
threaded, we fork() and talk via pipes or something when we want
parallelism). And it installs a bunch of header files and .o files into
magic paths rather than living in one big executable with symlinks
somewhere in the $PATH. Toybox is (usually) a single binary, and it
reads OPTIONAL config files (and is fine if they're not there, the same
way /etc/profile doesn't HAVE to exist).
The toolchain being separate from the command line is subtler, and maybe
a bit squishy, but it boils down to hardware dependencies. Things like
crt1.o are usually written in assembly, per-architecture. It has to care
about dynamic linking, and static PIE binaries, and whether this
architecture is RELA or what. It has to know what assembly mnemonics to
generate and how to translate them to machine code, do register
allocation, keep the calling conventions stright when calling into
objects produced by other compilers (such as linux-vdso.so.1 provided at
runtime BY THE KERNEL), and that's ignoring the optimizer entirely (you
NEED constant propogation and dead code elimination to build toybox and
the kernel, I was adding those to my tinycc fork). Lots of what a
toolchain does is target-dependent. And don't get me started on the
header and library search paths, and it exports more .h files like
varargs.h, AND it provides .o files and .so/.a files installed into
magic search paths... (Where DO crt1.o and friends live anyway?) And the
horror that is "multilib"...
My fork of tinycc was (among other things) trying to turn it into a
multiplexer so it could provide symlinked cc/ld/as/strip/cpp/nm and so
on all in one binary, acting appropriately. But still its OWN binary,
not part of toybox or busybox (license aside). I've been peeling bits
like "readelf" and "ar" out to do in toybox, but that's tentative and
squishy. Maybe nm and strip. I used to think "make" belonged there too,
but that command really isn't architecture dependent, and it HAS
historically been used for things other than builds...
I would definitely have said strace belongs in the toolchain pile, but
Elliott sent me an implementation because he needed it, and the syscalls
it translates are mostly the same, just the syscall numbers and struct
layouts vary a bit. And endianness and word size but that's always the
case. So that's a "Linux" tool not a compiler tool, a bit like readelf
vs macos mach-o. To be honest, strace is less trouble than getconf.
But cc/ld/cpp/as NEED stuff installed in /lib and /usr/include and so on
(not just optional, hard requirements), producing specific output for
m68k vs sparc and having to care about register allocation and branch
delay slots... Heck, just the -dM predefined macros are a good reason
why the toolchain package is not part of base toybox: I'd be
whack-a-moling architectures there too. If "as" is toolchain objdump -d
(disassembly) is toolchain.
Toybox tries not to care about that sort of thing (except as a passive
consumer), which is why the toolchain package is separate from cmdline.
Rob
More information about the Toybox
mailing list