[Toybox] [PATCH] Implement mv -n / cp -n (no clobber).

enh enh at google.com
Fri Apr 15 10:16:00 PDT 2016


On Fri, Apr 15, 2016 at 3:51 AM, Andy Chu <andychup at gmail.com> wrote:
>> If MirBSD had a public domain equivalent license I'd just suck the whole
>> thing in _now_ as a starting point for toysh, but alas it's using one of
>> the many, many conflicting "copy this license text verbatim into derived
>
> I dunno, mksh seems good in some ways (I've been reading the code as
> mentioned).  But it's also 31K lines of code... That doesn't seem like
> your style :)

things to avoid/reasons why i'd happily get rid of mksh in Android:

* having lots of builtins that don't behave like the real things.
* not letting me #define to disable those builtins (i think printf
might be an exception, but i'd like to disable everything that doesn't
have to be built-in).
* requiring perl to run the integration tests.
* zero unit tests.
* breaking clang builds to save 200 bytes with manual string sharing
(this is why i'm a release behind right now).
* having a mailing list that rejects mail from gmail (this is why
upstream doesn't know).
* not using git.
* not distinguishing interactive login shells (as far as i can tell)
to allow optimizing the system rc file.

probably other stuff, but those are the ones that spring to mind. the
first and last are the ones that users notice. the rest are things
that affect maintenance.

>>> bash seems to talk with some regret over support for multibyte
>>> characters: http://aosabook.org/en/bash.html
>>
>> Eh, do it right from the start and it's not that hard.
>
> I'm not a unicode expert, but when I say multibyte characters I mean
> 16 or 32-bit wide characters in memory.  I think that is the
> complication with mksh $(( )) and with bash.  utf-8 is mostly trivial
> to support if you use the SAME representation in memory.  char* IS
> utf-8!!!
>
> I didn't quite understand this until some rants from the Go/Plan 9
> guys at work, actually talking about Python's unicode strategy.  The
> whole point of utf-8 is that you don't have modify your existing C
> code to use it -- unless you require O(1) code point access, i.e.
> indexing, which surprisingly little real code does.
>
> Go uses a rune library for unicode, which is quite different than
> Python's unicode vs. str or C/C++ wchar_t and all that nonsense.
>
> So my hope is that I don't have to do anything special in my shell for
> Unicode -- I will just support utf-8, which doesn't really require any
> changes.  So far I don't see why that isn't viable.
>
>
>> But anyway, toybox implementing something basically compatible with what
>> the systemd container commands do looks reasonably straightforward. It's
>> just probably post-1.0 on the todo list.
>>
>> (I sat down with Lennart Pottering for an hour at that convention,
>> telling him outright I wanted toybox to clone the 20% of systemd that
>> gave people the option of not using it, and he did indeed walk me
>> through it. I hope that someday I get time to go through my notes and
>> that they make sense when I do. My main objection to systemd was always
>> https://news.ycombinator.com/item?id=7728692 and Lennart actually
>> pointed out there's a python implementation of it in the systemd git for
>> exactly that reason, but nobody seems to know about or use it.)
>
> That link seems to have broken, found it here:
> http://www.landley.net/notes-2014.html#23-04-2014
>
> Anyway the short summary is I don't like systemd's lack of modularity
> either.  And in addition I don't agree with their goal of getting rid
> of shell scripts in the bootstrap, although I understand the
> motivation.
>
> That is one of the motivations for my shell... to make it so shell
> scripts in the boot process *aren't* a problem.  There are lots of
> excuses (and valid reasons) not to write shell now, and I think those
> issues are fixable.
>
>> Add to that a BILLION seats to maintain backwards compatibility with,
>> and you can see why I've bumped dealing with containers to after a 1.0
>> release that's good enough to build AOSP under Android.
>
> Yes, I definitely agree with that goal... IMO containers need to
> settle down before toybox has something to implement.  I think rocket
> is still nascent, and if they can extract the good parts of systemd's
> container support that will be good for toybox.
>
>>> The motivation the idempotency is a long story... but suffice to say
>>> that people are not really using Unix itself for distributed systems.
>>> They are building non-composable abstractions ON TOP of Unix as the
>>> node OS (new languages and data formats -- Chef/Puppet being an
>>> example; Hadoop/HDFS; and tons of Google internal stuff).  AWS is a
>>> distributed operating system; Google has a few distributed operating
>>> systems as well.  It's still the early days and I think they are
>>> missing some lessons from Unix.
>>
>> Mike Gancarz wrote a great book called "The Unix Philosophy".
>>
>> Back when I was still on speaking terms with Eric Raymond I spent
>> several months crashing on the couch in his basement doing an "editing
>> pass" on The Art of Unix Programming that shrank the book from 9 to 20
>> chapters. (According to the introduction he almost made me a co-author.)
>
> Oh yes I see that!  I've recommended this book to lots of people and
> I've returned to it many times.
>
> As I said, all these nascent Cloud OSes are ostensibly using Unix, but
> are missing some lessons of architecture from Unix and Internet
> protocols.
>
> Though an even better example is Android.  Android is Linux but NOT
> Unix.  Android is a operating system meant to run only Java programs.
> You don't fork() and read the shebang line in Android; you spawn a new
> thread in the Java VM.  You don't use pipes or sockets or file
> descriptors; you use intents and activities, which are Java-specific
> messaging API.
>
> Now I won't go as far as to say that's wrong, since Unix GUIs have not
> been a great success... but I do think this architecture has caused
> some growing pains in the ecosystem (i.e. due to lack of language
> heterogeneity).
>
>
>> Who said I wanted to support all of it? I don't even know what all of it
>> is. I'm not supporting all of the sed extensions the gnu/dammit
>> implementation added, largely because I haven't seen anything using
>> them. (I have a todo item to add + to ranges, but don't remember what
>> that means at the moment because the note's over a year old. I should
>> look it up...)
>>
>> The point is I want to support the parts people actually _use_.
>
> Well you have to define what the use cases are ... it seems like your
> goals have encompassed "everything", and are also growing.  This might
> require reimplementing all of bash and coreutils, etc.  But if the
> goal is just Aboriginal -- e.g. implement enough of a shell to rebuild
> bash, make, etc. and enough of make to rebuild bash, make, etc.  Then
> that task seems a lot more feasible and well-defined.
>
> Bash is so widely deployed that to a first approximation every feature
> is used *somewhere*...
>
>
>> No, I don't think implementing a proper bash replacement will take
>> 50,000 lines. I expect to keep it below 1/10th of that, but we'll have
>> to see...
>>
>> Possibly my idea of "reasonable bash replacement" differs from yours?
>
> Yeah we'll have to see... I'm working on it :)  I think it's probably
> possible to write a shell that can rebuild basic GNU packages (which
> tend to use autotools) in 7-10K lines of code (but no less than that).
> But interactivity adds a lot of requirements.  The biggest file in
> mksh is "edit.c", which is 5500 lines by itself!
>
> FWIW I have read most of the POSIX spec.  And also I realized I have a
> printed-out copy if ut in the Apress book "Portable Shell Scripting"
> by Seebach, which I bought in 2010 or so ... I have been wanting to
> write a shell for at least that long!  And as mentioned, I've been
> testing bash and dash against it, and it's pretty accurate and
> valuable.  I'm guessing you will see occasionally see bashisms like [[
> in build scripts.  I actually didn't realize that $(()) was in POSIX.
>
>
>
>> Busybox ash is craptacular. When I was working on busybox there was
>> lash, hush, msh, and ash, and my response was to throw 'em all out and
>> start over with bbsh (which became toysh; the ONLY one of those four I
>> considered worth even trying to base bbsh on was lash, and that acronym
>> expanded to lame-ass shell.)
>
> As mentioned, there does seem to be a great propensity to use gotos,
> macros, globals, and long functions (approaching 1000 lines) in many
> shell implementations...
>
>
>> Android decided to replace make entirely, AOSP now uses something else.
>> ("Ninja" or some such, elliott mentioned it here a while ago and I need
>> to add a note to the roadmap.)
>
> Ninja is very nice -- its lexer uses re2c, which inspired me to use
> it.  Search for re2c here:
>
> https://github.com/ninja-build/ninja/blob/master/src/lexer.in.cc
>
> I actually started with this for my shell lexer, and a tiny bit of the
> Ninja code is left.
>
> I built AOSP and Cyanogen from scratch like 18 months ago with GNU
> make.  My understanding is that it still uses the same Android.mk
> makefile fragments, but they are compiled to Ninja files by Kati:
>
> https://github.com/google/kati
>
> Kati has a single-threaded executor, but also a compiler to Ninja text
> format, which does fast parallel execution and fast incremental
> rebuilds.
>
> Your resistance to C++ is understandable since in some ways it's
> anti-Unix.  I felt the same way for a long time -- I actually sold my
> C++ books when starting at Google, hoping not to write in it...
>
> But the Google style helps a lot (some C++ purists deride it as "C
> with classes").  Ninja, Kati, and my shell are all written
> Google-style C++ (no exceptions, StringPiece, unit test framework,
> etc.)  Also, C++ has evolved and gotten better.  People didn't really
> *know* how to write C++ in the 90's ... it took the industry awhile to
> collectively learn it!
>
> It's basically like Git.  It doesn't make any global sense, and it has
> a crapton of features, but it does have features you can ONLY get
> there.  And that makes it legitimately the best tool for many jobs.
>
> At least there is goodness there... there are plenty of things that
> are bloated, and widely used, but at the same time are NOT the best
> tool for any job :)
>
>
>>> I'm experimenting with using 're2c' for my shell lexer,
>>> which seems promising.
>>
>> I'm going with hand-written parser. It's not that hard to do it right.
>
> As mentioned, my parser is hand-written, but I'm basically porting it
> from a machine-checked ANTLR grammar, which is based on the POSIX
> grammar.
>
> The lexer is partially generated with re2c, which is absolutely the
> right choice IMO... I'll publish the code within a few weeks and I
> think you'll see what I mean.  It's completely different than lex/flex
> -- the Ninja code gives a good idea of how it works.  I think you're
> being too dismissive without seeing the code :)  There is something to
> learn here.
>
>>> 1) People keep saying to avoid shell scripts for serious "software
>>> engineering" and distributed systems.
>>
>> People keep saying to avoid C for the same reason. Those people are wrong.
>
> Yeah but that's not really a useful attitude... as mentioned, systemd
> has the goal of eliminating shell scripts from the boot process.
> Engineers at Google mostly don't use shell scripts, and sometimes end
> up with 1000 lines of C++ instead of a 20 lines of shell... Most cloud
> stuff tries to avoid shell, but still ends up with something even
> worse: shell embedded in config files like YAML and JSON and whatnot.
>
> There ARE valid reasons not to use shell; I think the language needs
> to be updated a bit.
>
>> Shells are interesting because there's a giant pile of existing shell
>> scripts that they can run, and a lot of people with knowledge of how to
>> write shell scripts in the current syntax. Your new syntax benefits from
>> neither of those.
>
> As mentioned I'm starting with POSIX; I haven't started implementing
> the new language.  I think it will actually be easy to write an
> auto-converter, although I don't want to get too far ahead of myself.
> I will publish the code soon.
>
>
>> I should have called toybox "dorodango" (ala
>> http://www.dorodango.com/about.html) because it's ALL about incessant
>> polishing.
>
> Yes I found it quite easy to hack on the code and understand it (well
> except for the sed variable names :) )  I can tell you put a lot of
> care into many parts of the code, and that is definitely refreshing
> and is paying dividends.
>
> But I also think you're being a little too precious about some code
> that is in bad shape... a lot of the contributed C code and shell
> scripts are frankly bad, and buggy (not just stuff in pending/
> either).  As mentioned, I have no doubt that I can find dozens of bugs
> in there if I were to have continued on my path with the test harness.
>
> I was planning to add a whole bunch of tests, clean things up, and do
> stuff like make tests a subprocess and define the test environment as
> we discussed.  But I don't mind that I got derailed at all, because I
> got to work on the shell that I've been wanting to do for years :)
>
> But yeah I am worried that you have signed yourself up for decades
> worth of work, without a good parallelization strategy :)  I think
> tests will help other developers contribute, and also make code
> reviews faster.  When I do code reviews, I look at the tests first.
> If I don't understand something, I ask the person to write a test, and
> then I read that.  That's often way faster than a lot of back and
> forth through e-mail.  Show me the running proof!
>
> After there are tests, it's super easy to refactor the code for style,
> and to do aggressive performance or size optimizations.  I don't feel
> comfortable rewriting other people's code without good tests.  It's
> very easy to introduce bugs.
>
> It's too early to say but I think you will probably find some of my
> shell work useful.  I would think of it this way: even if someone else
> implements shell and make for you, you STILL have many years of work
> left on toybox :)  I would say that a compatible shell and make are
> not less than a year's work *each*.
>
> Andy
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.


More information about the Toybox mailing list