[Toybox] Imported toysh blog test cases into OSH

Fri Jun 26 12:52:29 PDT 2020

(contributor from 4 years ago here, no longer on the list)

Hi Rob, I noticed that you've again been trying to figure out bash on
the blog: http://landley.net/notes.html

I imported 28 shell snippets as test cases into my shell test framework:

http://travis-ci.oilshell.org/jobs/2020-06-26__18-40-53.wwz/_tmp/spec/survey/toysh.html
(7 cases running against 3 shells)

http://travis-ci.oilshell.org/jobs/2020-06-26__18-40-53.wwz/_tmp/spec/survey/toysh-posix.html
(21 cases running against 6 shells)

(these specific links will go away but they're always 2 clicks away
from http://travis-ci.oilshell.org/jobs/  -- ovm- tarball  -> spec
tests )

Summary:

- OSH passes 22 cases, more than any other shell
- The cases found 3 divergences/bugs in OSH, 2 unimplemented features,
and 3 (documented) known differences

Results: https://github.com/oilshell/oil/issues/780

So thanks for posting these cases.

For background, OSH is able to run many real bash scripts -- in fact
Aboriginal Linux and other Linux distro scripts in January 2018 was a
significant milestone.  Now it can run neofetch (a 10K line bash
script) and Lisp/brainfuck/JSON in bash.

http://www.oilshell.org/blog/

-----

While I'm grateful for the test cases, reading the blog is a bit
painful because it's clear you will never finish your shell with the
current strategy (and this is an informed opinion, after doing it
myself).

I think you're less than 10% done with 3K lines of code.  I think you
will agree with my assessment if you scroll through this issue:

https://github.com/oilshell/oil/issues/653

Your test cases are good, but there are so many features you haven't
even begun to think about.

A couple things I would suggest:

1) Learn about grammars and parsing, and look at the POSIX shell
grammar, which all shells obey.  There are a few test cases that are
completely non-controversial and answered by POSIX that you were
confused by -- e.g. test case #12 on for loop parsing, #5 on
pipelines, #21 on pipelines all run under 6 shells.  It's POSIX, and
shells follow the POSIX grammar.  POSIX is necessary but not
sufficient (I would say it covers ~25% of bash's syntax).

Parsing and grammars are a compact way of reasoning about corner cases
-- more compact than C code.  Run say

osh -n -c 'echo ${@@Q}'

to see that in action.

2) Use some of the test cases and framework I developed and automated:
http://travis-ci.oilshell.org/jobs/2020-06-26__18-40-53.wwz/_tmp/spec/survey/osh.html

This test suite includes knowledge from other people, e.g. the author
of what I believe is the biggest bash program in the world (ble.sh at
30K+ lines).  Another shell semantic for you to ponder:

https://github.com/oilshell/oil/issues/706#issuecomment-615578349

Summary: bash is inconsistent with itself regarding "unset".  After
much analysis, OSH chose a simpler behavior that can run the biggest
bash programs in the world.

Andy