[Toybox] [PATCH] Implement mv -n / cp -n (no clobber).

Sun Mar 27 23:43:43 PDT 2016

On 03/27/2016 01:25 PM, Andy Chu wrote:
>> So I threw out CP_MORE as a bad idea, and almost all commands just have
>> the "include it or not" option now. There are a few global options, but
>> not many, and I may even eliminate some of those (I18N: the world has
>> utf8 now, deal with it).
> 
> I agree utf-8 is the right choice... The expr.c code from mksh has a
> bunch of multibyte character support at the end, which makes you
> appreciate the simplicity of utf-8:
> 
> https://github.com/MirBSD/mksh/blob/master/expr.c

If MirBSD had a public domain equivalent license I'd just suck the whole
thing in _now_ as a starting point for toysh, but alas it's using one of
the many, many conflicting "copy this license text verbatim into derived
works" clauses that leads to dozens of concatenated copies of the same
license text. (The "kindle paperwhite" had over 300 pages of this in
about->licenses. In comparison, toolbox's ~30 copies of
https://github.com/android/platform_system_core/blob/master/toolbox/NOTICE
is outright _restrained_. Heck, buildroot created _infrastructure_
("make legal-info") to concatenate all those license files together for
you.)

Not opening that can of worms. Sorry, but no.

> bash seems to talk with some regret over support for multibyte
> characters: http://aosabook.org/en/bash.html

Eh, do it right from the start and it's not that hard.

Did I mention I broke bash's command history with invalid utf8 sequences
so that cursor up/down advanced the cursor a couple spaces each time due
to the redraw now calculating the fontmetrics consistently? Fun times...

Yes, I have executable names (well, symlinks to sleep) that are invalid
utf8 sequences. Because top needed to display them. I also have a
command with a ) and a newline in the middle of it to confirm we're
doing the /proc/$$/stat field 2 parsing properly, _and_ that the result
doesn't corrupt the display. I may not have worked out automated
regression testing for this stuff yet, but I test it during development
and write down what the tests _were_...

And yes, those symlinks to sleep led to the todo item that if your
argv[0] is a symlink with a name we don't understand, we should resolve
the symlink to see if it's a name we _do_ understand, and repeat until
we've reached a non-symlink or done 99 resolves and assumed it's a loop.
I haven't decided whether or not to actually implement it yet, but it's
on the todo list.

Haven't decided yet because the scripts/single.sh builds don't have this
problem, but the fact they don't suck in the multiplexer logic also
means I can beef up the multiplexer infrastructure a bit without
worrying about "true" getting bigger when built standalone. (Fun trick:
make defconfig, make baseline, make menuconfig and switch off a command,
make bloatcheck, and that should show you what that command added to the
build. Kinda verbose to stick in the 'make help' output but I hope it's
obvious. I really need to add a FAQ page...)

>> The lfs-bootstrap.hdc build image (which I'm 2/3 done updating to 7.8, I
>> really need to get back to that) does a "find / -xdev | cpio" trick to
>> copy the root filesystem into a subdirectory under /home and then chroot
>> into that, so your builds are internally persistent but run in a
>> disposable environment.
>>
>> All this predates vanilla containers, I should probably add some
>> namespace stuff to it but haven't gotten around to it yet...
> 
> I'll have to look at Aboriginal again...

I tried hard to make http://landley.net/aboriginal/about.html explain
everything, and before that I did a big long presentation at
https://speakerdeck.com/landley/developing-for-non-x86-targets-using-qemu trying
to explain everything. (There were a couple earlier attempts to explain
everything but they bit-rotted badly over the years.)

> but for builds, don't you
> just need chroots rather than full fledged containers?  (i.e. you
> don't really care about network namespaces, etc.)

A) requires root access on the host.

B) testing ifconfig will still screw up your system to the point of
requiring a reboot.

D) Yes, I created infrastructure that created custom chroots to test
stuff, way back in the busybox days (I linked to it a few posts back,
and it's still in
https://github.com/landley/toybox/blob/master/scripts/runtest.sh#L115 )
and in _theory_ I can use
https://github.com/landley/aboriginal/blob/master/sources/root-filesystem/sbin/zapchroot
to clean up after it when testing "mount" (I really want to add -R to
toybox umount to recursively unmount all the mount points under this
directory, it's on the todo list), but the problem at the time was if I
wanted to use a squashfs image to test mount how did I know if squashfs
was in the kernel? And some annoying distro (probably fedora) refused to
put sbin in the path for normal users so mke2fs wasn't there when doing
a single mount test...

Really, when testing the root stuff you need a known environment. It's
really, really brittle otherwise.

> Oh one interesting thing I just found out is that you can use user
> namespaces to fake root (compare with Debian's LD_PRELOAD fakeroot
> solution)

Yes, but:

http://lists.linuxfoundation.org/pipermail/containers/2016-March/036690.html

Did I mention I worked a contract at Parallels in 2010 where I did
things like add container support to smbfs and expand the lxc web
documentation and help run the OpenVZ table at Scale explaining to
people why containers scale better than virtual machines do? (If you've
ever heard somebody describe a container as "chroot on steroids", it's
probably somebody who heard my Scale pitch. :)

Alas, said contract didn't last that long because they gave me a list of
a half-dozen things I could work on and I said "anything but NFS" and
they assigned me to work exclusively on NFS, which was very much not
fun. (Since I was working for a russian company, I did my work blogging
on livejournal, ala http://landley.livejournal.com/55534.html and
http://landley.livejournal.com/55727.html and
http://landley.livejournal.com/56285.html and... oh there's at least 3
dozen posts like that from that time period.)

On the bright side I learned a lot, from http://landley.net/lxc to the
recent
http://lists.landley.net/pipermail/toybox-landley.net/2016-March/004790.html
linking to one of my old livejournal entries from the period.

> Last year, I was using this setuid root executable
> (https://git.gnome.org/browse/linux-user-chroot/commit/), which is a
> nice primitive for reproducible builds (i.e. not running lots of stuff
> as root just because you need to chroot).

I think I've talked here before about wanting to add a "contain" command
to Linux that would do the whole chroot-on-steroids container setup
thing. I want to do a reasonable lxc replacement _not_ requiring funky
config files just because it was designed by IBM mainframe guys.

Although under the covers instead of doing chroot it would use
pivot_root because http://landley.net/notes-2011.html#02-06-2011

However, that's on my "after the 1.0 release" todo list, along with
adding support for "screen" and so on.

(There are a lot of interesting commands I'm pretty sure I could
implement cleanly in less than 1000 lines of code which are reasonable
to add to toybox. But the 1.0 goal is to make Android self-hosting so
that when the PC goes away we have some control over the process.)

That said, when I asked people about this at the Linux Plumber's
conference containers BOF last year, they basically said that Rocket is
based on systemd's container support, and everything else (including
Docker) is based on Rocket.

(The systemd maintainer has a very interesting talk about systemd's
containers at LinuxCon Japan in 2015, but I only caught the first five
minutes of it because it was scheduled opposite another talk I wanted to
see and I STUPIDLY assumed that just because ELC records every panel and
posts it to youtube, LinuxCon might record SOME of them. But no, ELC
used to be http://landley.net/notes-2010.html#27-10-2010 and LinuxCon
_started_ under the Linux Foundation rather than being acquired by it,
so of course LinuxCon won't let you see its panels without paying the
Linux Foundation a lot of money (or volunteering to give them content
people can only see if they pay the Linux Foundation a lot of money).

But anyway, toybox implementing something basically compatible with what
the systemd container commands do looks reasonably straightforward. It's
just probably post-1.0 on the todo list.

(I sat down with Lennart Pottering for an hour at that convention,
telling him outright I wanted toybox to clone the 20% of systemd that
gave people the option of not using it, and he did indeed walk me
through it. I hope that someday I get time to go through my notes and
that they make sense when I do. My main objection to systemd was always
https://news.ycombinator.com/item?id=7728692 and Lennart actually
pointed out there's a python implementation of it in the systemd git for
exactly that reason, but nobody seems to know about or use it.)

Often when you sit down with people you disagree with, what they're
doing makes sense in _their_ heads...

> And I see in their README they are pointing to a Bazel (google build
> system) tool that has an option to fake root with user namespaces.
> Although I'm not sure you want to make that executable setuid root.

Right now, Android isn't likely to install toybox suid. (I have a todo
item to add "make suid" and "make nosuid" targets that are defconfig
filtered for the suid binaries and the non-suid binaries, so you can
install toybox as two binaries, only one of which needs the suid bit.)

Android predates containers, and instead hijacked user ids to mean
something else. Mixing container support in with that (and/or migrating
any of what they're currently doing _to_ containers) would have to
involve long talks with a bunch of android guys, probably in person, to
work out what the result should look like.

My impression from the one time I met Elliott in person was that he's
drinking from a firehose of contributions from large corporations and
that they're in a serious red queen's race which doesn't really give
them any time for long-term design planning. The major design
initiatives I've seen since are things like "try to fix the build
system". (The GUI gets love because it's what people see.)

Add to that a BILLION seats to maintain backwards compatibility with,
and you can see why I've bumped dealing with containers to after a 1.0
release that's good enough to build AOSP under Android.

>> A) Elaborate on "oddly conflates" please? I saw it as 'ensure this path
>> is there'.
>>
>> B) [ ! -d "$DIR" ] && mkdir "$DIR"
> 
> It says this right in the help:
> 
>  -p, --parents     no error if existing, make parent directories as needed
> 
> I guess you can think of the two things as related, but it's easy to
> imagine situations where you only want to create a direct descendant
> and it's OK if it exists.

Yes, it's ok if it exists? That's what mkdir -p does? (Exists is a
file?) I still don't understand, what behavior did you _prefer_?

  mkdir -p "$(dirname "$DIR")" && mkdir -p "$DIR"

maybe?

> B) has a race condition whereas checking errno doesn't, and mkdir $DIR
> || true has the problem that it would ignore other errors.

There's still a race condition, it being internal to mkdir doesn't
change anything. Somebody can rmdir the intermediate directories mkdir
creates before it can create the next one in sequence (in which case
mkdir -p can indeed return an error)..

Heck, somebody could come along an unlink your "mkdir onedir" right
after you create it. The -p isn't required for a race condition here...

(One of the reasons I use openat() and friends so heavily is I can get
the race conditions down to a dull roar.)

>>> # likewise rm can't be run twice; the second time it will fail because
>>> the file doesn't exist.  --force conflates the behavior of ignoring
>>> missing arguments with not prompting for non-writable files
>>
>> -f means "make sure this file is not there".
> 
> The help also describes the two different things it does:
> 
> -f, --force           ignore nonexistent files and arguments, never prompt
> 
> The first behavior makes it idempotent... the second disables the
> check when writing over read-only files, which is unrelated to
> idempotency (and yes I get that you're modifying the directory and not
> the file, but that's the behavior rm already has)

It's not writing over them, it's removing a link to them. You can have
multiple links to the same file in a given filesystem, the storage is
only released when the last reference goes away (and open files count as
a reference, and there's a special case hack in the kernel that lets "ln
/proc/self/fd/3 filename" work if the target's the right filesystem
(still can't do a cross-filesystem hardlink, but those files aren't
_really _ symlinks) just so you CAN create a filesystem entry for a file
that was deleted but which you still have open.

(Checkpoint and restore in userspace need to be able to do that.
Containers again. I've vaguely wondered if CIRU is a good thing to add
to toybox when I get around to proper container support, but I was
mostly waiting for it to stop being a moving target. It's on the todo list!)

>>> # behavior depends on whether bar is an existing directory, -T /
>>> --no-target-directory fixes this I believe
>>> $ cp foo bar
>>
>> I do a lot of "cp/mv/rsync fromdir/. todir/." just to make this sort of
>> behavior shut up, but it's what posix said to do.

I left out the -r, and for mv it has to be mv/fromdir/* todir/ because
there's no -r there.

> What does this do?  It doesn't seem to do quite what -T does:

mv works differently, but for cp and rsync it means never winding up
with todir/todir and two copies of rsynced files instead of actually
properly rsynced.

> $ ls
> bar  foo  # empty dirs

A test which defeats the purpose of a syntax trying to ensure the
contents of one directory gets moved to the contents of another
directory, rather than creating a "from" directory under "to".

> $ mv foo/. bar/.
> mv: cannot move ‘foo/.’ to ‘bar/./.’: Device or resource busy
> $ mv -T foo bar  # now foo is moved over the empty dir bar
> 
>> Yes and no. I've seen a lot of people try to "fix" unix and go off into
>> the weeds of MacOS X or GoboLinux. Any time a course of action can be
>> refuted by an XKCD strip, I try to pay attention. In this case:
>>
>> https://xkcd.com/927/
>>
>> Unix has survived almost half a century now for a _reason_. A corollary
>> to Moore's Law I noticed years ago is that 50% of what you know is
>> obsolete every 18 months. The great thing about unix is it's mostly the
>> same 50% cycling out over and over.
> 
> Definitely agreed -- but that's why I'm not creating an alternative,
> but starting with existing behavior and adding to it.

I'm starting with existing behavior and trying to get it right. Given
how bloated the GNU versions are, and that original motivation of
busybox and toybox was to create stripped-down versions, adding behavior
is something I try not to take lightly.

> That's one of
> the reasons I am interested in toybox... to puzzle through all the
> details of existing practice and standards where relevant, to make
> sure I'm not inventing something worse :)

Good luck with that. I'm still not there myself. :)

> The motivation the idempotency is a long story... but suffice to say
> that people are not really using Unix itself for distributed systems.
> They are building non-composable abstractions ON TOP of Unix as the
> node OS (new languages and data formats -- Chef/Puppet being an
> example; Hadoop/HDFS; and tons of Google internal stuff).  AWS is a
> distributed operating system; Google has a few distributed operating
> systems as well.  It's still the early days and I think they are
> missing some lessons from Unix.

Mike Gancarz wrote a great book called "The Unix Philosophy".

Back when I was still on speaking terms with Eric Raymond I spent
several months crashing on the couch in his basement doing an "editing
pass" on The Art of Unix Programming that shrank the book from 9 to 20
chapters. (According to the introduction he almost made me a co-author.)

We tried to cover a lot of what we throught the unix philosophy was in
there too, except he dropped out of college to become a Vax admin the
same year my family got a commodore 64 for christmas (which was in my
room by new year's). He was an ex-math prodigy who burned out young and
dropped out of college, and I got a math minor for the same reason an
acrophobic person would go skydiving (I refused to be beaten by it).

We came at everything from opposite poles (8 bit vs minicomputer,
mathematician vs engineer), and the book got so much bigger because I
argued with him about everything (and then made him write up the results
of our arguments so the book would have a consistent voice).

> Sure I could just go change coreutils and bash ... I've been puzzling
> through the bash source code and considering that.

I try not to look at FSF source code. Life's too short for that.

> If one of your goals is to support Debian, I think you should be
> really *happy* that they went through all the trouble of porting their
> shell scripts to dash, because that means all the shell scripts use
> some common subset of bash and dash.

Allow me to refer you to an earlier rant on that topic, rather than
repeating it here:

http://lists.landley.net/pipermail/toybox-landley.net/2015-January/003831.html

Not happy doesn't begin to cover it.

>  Portability means that the
> scripts don't go into all the dark corners of each particular
> implementation.

There's a little more to it than that. :)

> bash is like 175K+ lines of code, and If you wanted to support all of
> it,

Who said I wanted to support all of it? I don't even know what all of it
is. I'm not supporting all of the sed extensions the gnu/dammit
implementation added, largely because I haven't seen anything using
them. (I have a todo item to add + to ranges, but don't remember what
that means at the moment because the note's over a year old. I should
look it up...)

The point is I want to support the parts people actually _use_.

> I think you would end up with at least 50K LOC in the shell...

I mentioned in http://landley.net/aboriginal/history.html that looking
at the gnu implementation of cat and seeing it was 833 lines of C was
what started my looking at busybox source in the first place. At the
time busybox' was 65 lines, half of which was license boilerplate.
That's MORE than a factor of 10 difference, and that's not at all
unusual for gnu bloatware.

The only two toybox commands that wc -l puts over 1000 lines are sed.c
(1062 lines, 148 of which are help text), and ps.c (which is currently
implementing 5 commands, ps, top, iotop, pgrep, and pkill, only because
at the time I didn't know how to factor out the common infratructure
into /lib and now I do). Once I break top.c and pgrep.c out from ps.c
(and factor the common infrastructure out into lib/), the result should
be _well_ under 1000 lines.

Yes, there's common infrastructure in lib. Currently totalling 4380
lines for lib/* | wc -l". Add in main.c at the top level (228 lines) and
that means toybox's shared infrastructure is 4608 lines, for all of it
combined.

No, I don't think implementing a proper bash replacement will take
50,000 lines. I expect to keep it below 1/10th of that, but we'll have
to see...

> which is almost the size of everything in toybox to date.

Which should be a hint that something is off, yes.

Possibly my idea of "reasonable bash replacement" differs from yours?

> If on the
> other hand you want a reasonable and compatible shell, rather than an
> "extremely compatible" shell,

Look at the cp I did. Fully _half_ the command line options aren't
listed in posix. The result is 492 lines implementing cp, mv, and
install in one file. (Admittedly heavily leveraging 193 lines of dirtree.c.)

I have no idea how big the gnu cp implementation is (didn't look at it,
not gonna), but I'm guessing bigger than that.

> it would probably be a lot less code...
> hopefully less than 20K LOC (busybox ash is a 13K LOC IIRC, but it's
> probably too bare)

Busybox ash is craptacular. When I was working on busybox there was
lash, hush, msh, and ash, and my response was to throw 'em all out and
start over with bbsh (which became toysh; the ONLY one of those four I
considered worth even trying to base bbsh on was lash, and that acronym
expanded to lame-ass shell.)

Admittedly they've patched it fairly extensively since then (a process I
haven't really been following), but there was never anything approaching
a clean design.

>> I've come to despise declarative languages. In college I took a language
>> survey course that covered prolog, and the first prolog proram I wrote
>> locked the prolog interpreter into a CPU-eating loop for an hour, in
>> about 5 lines. The professor looked at it for a bit, and then basically
>> said to write a prolog program that DIDN'T do that, I had to understand
>> how the prolog interpreter was implemented. And this has pretty much
>> been my experience with declarative languages ever since, ESPECIALLY make.
> 
> This is a long conversation, but I think you need an "escape hatch"
> for declarative ones, and make has one -- the shell.

Mixing imperative and declarative contexts does not IMPROVE matters. It
means you need to control the order in which things happen, and can't.

You know how people work around that in make? By using it recursively,
DEFEATING THE PURPOSE OF MAKE. (http://aegis.sourceforge.net/auug97.pdf
was a lovely paper on that.)

> If you don't
> have an escape hatch, you end up with tortured programs that work
> around the straightjacket of the declarative language.

The escape hatch IS working around the straightjacket. Mixing them
doesn't make things any less tortorous.

You'll notice that all my make wrapper is doing is calling a script.
Either make.sh, single.sh, or install.sh. And you can install all those
yourself if you want to.

I am using make's target syntax to determine whether or not a
prerequisite changed and thus we should call the build script, but I
could just as easily use find -newer (and internally the scripts do to
determine which files need rebuilding).

> (But this is
> not really related what I was suggesting with idempotency; this is
> more of a semantic overload of "declarative")
> 
> Unfortunately GNU make's solution was not to rely on the escape hatch
> of the shell, but to implement a tortured shell within Make (it has
> looping, conditionals, functions, variables, string library functions,
> etc. -- an entirely separate Turing complete language)

Android decided to replace make entirely, AOSP now uses something else.
("Ninja" or some such, elliott mentioned it here a while ago and I need
to add a note to the roadmap.)

I think I still need a "make" just because so much else uses make. But I
have an existing "make rant" about how ./configure; make; make install
EACH need a "cvs->git" style complete throw out and rethink. The most
recent time I linked to it here seems to be:

https://www.mail-archive.com/toybox@lists.landley.net/msg01915.html

The rant itself being these two posts:

http://lists.landley.net/pipermail/aboriginal-landley.net/2011-June/000859.html
http://lists.landley.net/pipermail/aboriginal-landley.net/2011-June/000860.html

Honestly, if I could work all this detail into design.html and friends I
would, but there's just too much to cover. At best I could get a link
collection...

> Make's abstraction of lazy computation is useful (although it needs to
> be updated to support directory trees and stuff like that).  But most
> people are breaking the model and using it for "actions" -- as
> mentioned, the arguments to make should be *data* on the file system,
> and not actions; otherwise you're using it for the wrong job and
> semantics are confused (e.g. .PHONY pretty much tells you it's a hack)

One of the first programs I wrote for OS/2 circa 1992 was "bake", I.E.
"better make". The first line of its data file was the output file to
make (with any compiler options appended), the remaining lines were the
source files, and then IT parsed them to figure out how they fit
together and when what needed rebuilding.

Alas, the code is lost to history. There's the occasional trace of it
online
(https://fidonet.ozzmosis.com/echomail.php/os2prog/1b4f877b1d235864.html
for example) but I went off to other things...

>> I do this kind of thing ALL THE TIME. I have a black belt in "sticking
>> printfs into things" because I BREAK DEBUGGERS. (I'm quite fond of
>> strace, though, largely because it's survived everything I've thrown at
>> it and is basically sticking a printf into the syscall entry for me so I
>> don't have to run the code under User Mode Linux anymore, where yes I
>> literally did that.)
> 
> I think the problem is that you expect things to actually work!  :)

I expect to have to MAKE things work. With a spoon.

> A lot of programmers have high expectations of software; users generally
> have low expectations.
> 
> http://blog.regehr.org/archives/861 -- "How have software bugs trained
> us? The core lesson that most of us have learned is to stay in the
> well-tested regime and stay out of corner cases. Specifically, we will
> ... "

Scour down to the bare metal with fire and build back up from there.

> Another hacker who has the same experience:
> http://zedshaw.com/2015/07/08/i-can-kill-any-computer/
> 
> I was definitely like this until I learned to stop changing defaults.
> Nobody tests anything by the default configuration.

Yes, pretty much why toybox "defconfig" is what I expect people to use,
and I'm eliminating command sub-options.

However, _I_ test my code. I turn my ability to break everything on the
stuff I write, and go down weird little ratholes nobody else is ever
going to notice because they bother me. All the time.

> Want to switch
> window managers in Ubuntu?  Nope, I got subtle drawing bugs related to
> my video card.  As penance for my lowered expectations, I try to work
> on quality software...

I just upgraded my netbook from xubuntu 12.04 to xubuntu 14.04. This ate
3 days and I'm still not done (of _course_ the upgrade didn't work and
turned into a complete reinstall. Of course it was still broken after a
reinstall. Of course I tweeted an xkcd strip at somebody
(https://twitter.com/landley/status/714273505592750081) to explain the
current status of it a few hours ago).

But on the bright side when I'm done I may be able to move my email to
the machine I actually carry with me most of the time, which would be
nice. (Of COURSE the version of thunderbird in 12.04 can't read the data
files from the version in 14.04.)

>> Oh, and when $(commands) produce NUL bytes in the output, different
>> shells do different things with them. (Bash edits them out but retains
>> the data afterwards.)
>>
>> I was apparently pretty deep into this stuff in mid-2006:
> 
> Yeah hence my warning about trying to be too compatible with bash ...

Oh, I know what I'm in for. It's just one of those "I'm not locked in
with bash, bash is locked in with me" sorta things.

http://www.schlockmercenary.com/2006-07-13

However, I don't have space to open that can of worms _yet_.

> Reading the aosabook bash article and referring to the source code
> opened my eyes a lot.

Which article? (URL?)

> sh does have a POSIX grammar, but it's not
> entirely useful,

Yeah, I noticed. I _have_ read the posix shell spec all the way through
(although it was the previous release, susv3). I also printed out the
bash man page and read through _that_ (although it was bash 2.x)... You
may notice a theme here. I did extensive research on this... circa 2006.

> as he points out, and I see what he means when he
> says that using yacc was a mistake (top-down parsing fits the shell
> more than bottom-up).

I don't intend to use yacc.

> On the other hand, writing a shell parser and lexer by hand is a
> nightmare too (at least if you care about bugs, which most people seem
> not to).

Meh, I'm a huge fan of Fabrice Bellard's original tinycc which took
exactly that approach to C. (Circa 2006. I maintained my own fork for a
while, http://landley.net/code/tinycc explains why I stopped. The
current stuff is just _nuts_.)

> I'm experimenting with using 're2c' for my shell lexer,
> which seems promising.

I'm going with hand-written parser. It's not that hard to do it right.

Also keep in mind I have incentive from $DAYJOB to make it work nicely
on nommu systems. :)

> Reading the commit logs of bash is interesting... all of its features
> seem to be highly coupled.  There are lots of lists like this where
> one feature is compared against lots of other features:
> http://git.savannah.gnu.org/cgit/bash.git/tree/RBASH .  The test
> matrix would be insane.

Yup.

I intend to do more or less what posix wants first (with my standard
"document where I deviate from posix being stupid and move on"), and
then add lots of things like curly bracket filenames and <(subshell)
arguments and so on.

If somebody wants -r, they can poke me on the list. Containers exist now.

>>> and
>>> toybox/busybox are obvious complements to a shell.  Though it's
>>> interesting that busybox has two shells and toybox has zero, I think
>>> my design space is a little different in that I want it to be sh/bash
>>> compatible but also have significant new functionality.)
>>
>> Other than "loop", what are you missing?
> 
> At a high level, I would say:
> 
> 1) People keep saying to avoid shell scripts for serious "software
> engineering" and distributed systems.

People keep saying to avoid C for the same reason. Those people are wrong.

> I know a lot of the corner
> cases and a lot of people don't, so that could be a defensible
> position.  You can imagine a shell and set of tools that were a lot
> more robust (e.g. pedantically correct quoting is hard and looks ugly,
> but also more than that)
> 
> 2) Related: being able to teach shell to novices with a straight face.
> Shell really could be an ideal first computing language, and it was
> for many years.  Python or even JavaScript is more favored now
> (probably rightly).

Python 3.0 eliminated my interest in python 2.0 _and_ 3.0.

Javascript has the advantage of it being in every web browser. It's a
giant mess of a langauge other than that, and not really designed for
use outside its initial framework (node.js notwithstanding).

> But honestly shell has an advantage in that to
> *DO* anything, you need to talk to a specific operating system, and
> Python and JavaScript have this barrier of portability.  But the bar
> has been raised in terms of usability -- e.g. memorizing all these
> single letter flag names is not really something people are up to.

Education is a can of worms I'm not going into right now. (I say that
having taught night courses at the local community college for a couple
years way back when. This email is long enough as it is...)

> 3) Security features for distributed systems ... sh is obviously not
> designed for untrusted input (including what's on the file system).
> 
> I could get into a lot of details but I guess my first task is to come
> up with something "reasonably" compatible with sh/bash, but with a
> code structure that's extensible.

Shells are interesting because there's a giant pile of existing shell
scripts that they can run, and a lot of people with knowledge of how to
write shell scripts in the current syntax. Your new syntax benefits from
neither of those.

> FWIW toybox code is definitely way cleaner than bash, though I
> wouldn't necessarily call it extensible.  You seem to figure out the
> exact set of semantics you want, and then find some *global* minimum
> in terms of implementation complexity,

Yeah, sort of the point.

> which may make it harder to add
> big features in the future (you would have to explode and compress
> everything again).

I've done it several times already over the course of the project.

I'm sure I have an existing rant on this, the phrase I normally use is
"infrastructure in search of a user" so lemme google for that..

And of course the first hit is
http://lists.landley.net/pipermail/toybox-landley.net/2013-April/000882.html
which is the first message in the ifconfig cleanup series from
http://landley.net/toybox/cleanup.html

> I suppose that is why this silly -n patch requires
> recalling everything else about cp/mv, like --remove-destination :)

Understanding all the possible combinations so you can test them and get
the interactions right also requires that.

> But I definitely learned something from this style even though I'm not
> sure I would use it for most projects!

Oh I don't use it for all projects either.

I should have called toybox "dorodango" (ala
http://www.dorodango.com/about.html) because it's ALL about incessant
polishing.

Usable versions of these command line utilities already exist. Busybox
already existed when I started Toybox. The gnu tools existed when Erik
Andersen started modern busybox development. The BSD tools existed when
gnu started. The system V tools existed when bsd started.

Along the way there's been a dozen independent implementations that
forked off that aren't in that list. The first full from-scratch unix
clone was Coherent from the mark williams company in  1980, which
included its own kernel, compiler, libc, and command line utilities.
Linux forked off of Minix, another clean room clone of the entire OS
including kernel, compiler, libc, and command line utilities.

I'm trying to do a _BETTER_ job.

> Andy

Rob

 1459147423.0