[Toybox] [PATCH] Implement mv -n / cp -n (no clobber).

Rob Landley rob at landley.net
Sat Mar 26 00:00:01 PDT 2016



On 03/26/2016 12:57 AM, Andy Chu wrote:
>> So that's why I was thinking of adding cp --big-long-optinon-name to
>> the toybox install.sh, but I just went with "rm -f oldname && cp new oldname"
>> in the recent cleanup instead, because I can't trust the host's cp to have
>> that option _before_ toybox is installed, and we can't use the toybox
>> cp to install itself when cross compiling.
> 
> OK thanks for the test case -- I ran it and it helps me understand
> what --remove-destination does.  Although I'm still wondering what is
> wrong with rm && cp?  As you say yourself, having it in one command
> doesn't remove any race conditions.

I remembered it as I was typing, and it was the next paragraph of the
previous email. On a system where cp is currently a symlink to toybox,
and you're installing a new toybox over that:

  rm /usr/bin/toybox
  cp toybox /usr/bin/toybox #fails because cp is dangling symlink

It's not a race condition, it's a "you can keep running a binary after
it's deleted but can't launch new instances" problem.

> I guess you're saying you can get by with -f alone with toybox,
> because you can control whether it is writable. But that doesn't
> solve the problem with busybox or other multiplexer binaries which you
> don't control?  Did I paraphrase right?

More or less, yes.

> Either way I still don't see what's wrong with rm && cp.  I thought
> Aboriginal Linux was supposed to be the minimal set of things needed
> to build itself... and if toybox followed that philosophy then it
> would also leave out what can be accomplished by sequential
> composition in the shell :)

Aboriginal Linux lets you build linux from scratch under it. I added a
number of commands that aren't needed to bootstrap aboriginal, but _are_
necessary to bootstrap LFS.

Thus _it_ can get away with not having lex because you can build and
install lex natively after the fact. You can piecemeal supplement what I
provide, with replacing the existing stuff if you don't want to.

With toybox, it's an all-or-nothing thing. If you really need some
option toybox doesn't provide _within_ a command, you basically have to
install another implementation and stop using the toybox implementation
of that command. You can't supplement with more options after the fact.

This doesn't mean I implement every option, but it does mean I have to
consider them and decide whether or not to.

> Not that I am really arguing against adding --remove-destination --
> just curious.

Honestly, the main reason I haven't added it so far is it doesn't have a
short option. It's only a line or two of code to implement, but the
command line interface is ugly, and I'm not sure how much it's really
used out there. It _does_ solve some specific problems you can't solve
without it, but they're not very common problems.

Until recently I had CP_MORE so you could configure a cp with only the
posix options, but one of the philosophical differences I've developed
since leaving toybox is all that extra configuration granularity is
nuts. The space savings aren't worth the fact you no longer get
consistent debugging coverage (you may have configurations that don't
BUILD, although switching from #ifdef to if (CFG_BLAH) cuts that down a
lot), that there's increased cognitive load on the package's users to
_configure_ the thing (deciding what commands to include is hard enough)
and then when you've got a deployed system the package provides
inconsistent behavior depending on how it's configured, so you can't say
"this is how toybox behaves".

So I threw out CP_MORE as a bad idea, and almost all commands just have
the "include it or not" option now. There are a few global options, but
not many, and I may even eliminate some of those (I18N: the world has
utf8 now, deal with it).

> Honestly this entire discussion is reminding me a Unix deficiency I've
> noticed.  For background, in the "cloud" world (as opposed to the
> embedded world), people tend to set up their base image with something
> like Chef, Puppet, or Ansible, which are basically horrible
> Ruby/Python DSLs with embedded shell snippets (and of course nobody
> knows how to quote correctly when shell is embedded in yet another
> language...).
> 
> My reaction is: why don't you just use shell scripts to configure your
> servers?  (And others have had the same thought, e.g.
> https://github.com/brandonhilkert/fucking_shell_scripts , although
> ironically it depends on Ruby ...)

Toybox has no external dependencies. You build a static binary, drop it
on the machine, and it should work. Any config files for stuff like mdev
or dhcp are always _optional_, and there should be sane default behavior
when they're not there.

That's an explicit design goal.

> Well the one good argument is that those systems are supposed to be
> idempotent, whereas shell is not idempotent.  To be idempotent, you
> would basically describe a final state, without regard to the existing
> state -- which is not really possible with shell.

Aboriginal Linux is idempotent (modulo the /home mount
dev-environment.sh provides, which is intentionally persistent scratch
space), and it's driven by shell scripts. How? Simple: the root
filesystem is an initmpfs initialized from a read-only gzipped cpio, and
the native compiler is a squashfs filesystem (also read only). If you
add a build control image, that's another squashfs mounted on /mnt.

The lfs-bootstrap.hdc build image (which I'm 2/3 done updating to 7.8, I
really need to get back to that) does a "find / -xdev | cpio" trick to
copy the root filesystem into a subdirectory under /home and then chroot
into that, so your builds are internally persistent but run in a
disposable environment.

All this predates vanilla containers, I should probably add some
namespace stuff to it but haven't gotten around to it yet...

> In distributed systems, safe retries are essential.    Perhaps a more
> immediate example is that you would want to be able to Ctrl-C your
> shell script at an *arbitrary* point in time and have it work
> correctly the second time (without resetting the state back to what it
> was the first time.)
> 
> Examples:
> 
> # mkdir can't be run twice; it fails the second time because the dir
> exists.  mkdir -p oddly conflates the behavior of ignoring existing
> dirs with creating intermediate dirs

A) Elaborate on "oddly conflates" please? I saw it as 'ensure this path
is there'.

B) [ ! -d "$DIR" ] && mkdir "$DIR"

> $ mkdir dir
> 
> # likewise rm can't be run twice; the second time it will fail because
> the file doesn't exist.  --force conflates the behavior of ignoring
> missing arguments with not prompting for non-writable files

-f means "make sure this file is not there".

And you're not writing to the file's contents, you're unlinking it form
this directory. There could be twelve other hardlinks to the same inode
out there, rm doesn't care.

I admit "zero arguments are ok" is a really WEIRD thing posix asked for,
dunno why they did that, but I implemented it. (You can tell I didn't
write tests/rm.test because it hasn't got a test for "rm -f" with no
arguments. It should. That should totally be "tests/rm.pending"...)

> $ rm foo
> 
> # behavior depends on whether bar is an existing directory, -T /
> --no-target-directory fixes this I believe
> $ cp foo bar

I do a lot of "cp/mv/rsync fromdir/. todir/." just to make this sort of
behavior shut up, but it's what posix said to do.

> Anecdotally, it seems like a lot of shell script issues are caused by
> unexpected existing state, but in a lot of cases you don't CARE about
> the existing state -- you just want a final state (e.g. a bunch of
> symlinks to toybox).

"ln -sf" actually works pretty predictably. :)

> That seems to be a common thread in a lot of
> situations you're describing if I'm not mistaken.

Yes and no. I've seen a lot of people try to "fix" unix and go off into
the weeds of MacOS X or GoboLinux. Any time a course of action can be
refuted by an XKCD strip, I try to pay attention. In this case:

https://xkcd.com/927/

Unix has survived almost half a century now for a _reason_. A corollary
to Moore's Law I noticed years ago is that 50% of what you know is
obsolete every 18 months. The great thing about unix is it's mostly the
same 50% cycling out over and over.

Sure, it's crotchety and idiosyncratic. So's the qwerty keyboard.

> So if Unix tools had flags that made them behave in an idempotent
> manner, there would be less objection to use them for cloud server
> management.  They would be more "declarative" and less imperative.

I've come to despise declarative languages. In college I took a language
survey course that covered prolog, and the first prolog proram I wrote
locked the prolog interpreter into a CPU-eating loop for an hour, in
about 5 lines. The professor looked at it for a bit, and then basically
said to write a prolog program that DIDN'T do that, I had to understand
how the prolog interpreter was implemented. And this has pretty much
been my experience with declarative languages ever since, ESPECIALLY make.

I wound up in embedded programming because I break everything. I broke
the command line tools, I broke libc, I broke the kernel, broke the
compiler and linker, and I debugged my way down through all of it. If
you have a one instruction race, I'll hit it. No really:

http://lkml.iu.edu/hypermail/linux/kernel/0407.3/0027.html

I do this kind of thing ALL THE TIME. I have a black belt in "sticking
printfs into things" because I BREAK DEBUGGERS. (I'm quite fond of
strace, though, largely because it's survived everything I've thrown at
it and is basically sticking a printf into the syscall entry for me so I
don't have to run the code under User Mode Linux anymore, where yes I
literally did that.)

Last month I posted links here to me debugging my way into the kernel
and back out again to find a bug in an entirely different process that
had already stopped running which was affecting my build. The only thing
unusual about that instance is I did a longish writeup of it.

So when a declarative language offers to make life easier by automating
away all the complexity and doing it for me? Yeah... I'll let other
people who aren't me get right on that.

> Anyway, that's a bit of a tangent... (One reason I'm interested in
> toybox is that I've had a longstanding plan to write my own shell

Heh. Me too. I spent rather a lot of 2006 and 2007 working out why you
can't ctrl-z out of the bash "read" builtin, for example, or that when
sourcing a file you _can_ pipe the output somewhere but if you ctrl-z
during that it aborts the "source" and the resume resumes the host
shell. (Maybe they fixed it?)

Oh, and when $(commands) produce NUL bytes in the output, different
shells do different things with them. (Bash edits them out but retains
the data afterwards.)

I was apparently pretty deep into this stuff in mid-2006:

http://landley.net/notes-2006.html#14-09-2006

But by december I was just _angry_ about anything fsf-related, which
seriously shortened my patience when prodding at bash to see what it did:

http://landley.net/notes-2006#25-12-2006

Of course between those, there was a bad case of Bruce:

http://lwn.net/Articles/202106/
http://lwn.net/Articles/202120/

Anyway, I'm trying to get back to the shell now but there are so many
other things that keep interrupting... :)

> and
> toybox/busybox are obvious complements to a shell.  Though it's
> interesting that busybox has two shells and toybox has zero, I think
> my design space is a little different in that I want it to be sh/bash
> compatible but also have significant new functionality.)

Other than "loop", what are you missing?

(For YEARS I've wanted to pipe the output of a command line into the
input of that same command line. Turns out to be hard, and no you can't
do it as an elf binary because "loop thingy | thingy" would have to have
everything after loop quoted and spawn another shell instance via
/bin/sh and it's just annoying. I've made it work with FIFOs but that
requires writeable space (where?) and cleaning up after a badly timed
ctrl-c is just awkward and it turns into this BIG THING...

(The first time I wanted it was while trying to make PPPOE work, using a
binary that expected to be run as a child of a process I didn't have and
was trying to fake with a long command line. The second time was when
doing a shell implementation of expect for a test suite, because I'm not
installing tcl, if expect is the only thing keeping an entire
programming language alive it should die already.)

> Andy

Rob

 1458975601.0


More information about the Toybox mailing list