[Toybox] ls (/-b/-q) (/on a tty)

enh enh at google.com
Sun Dec 10 11:25:41 PST 2017


On Fri, Dec 8, 2017 at 6:12 PM, Rob Landley <rob at landley.net> wrote:
> On 12/08/2017 12:58 PM, enh wrote:
>> the Android Studio folks noticed that `ls foo\ bar` changed from
>> returning "foo bar" to "foo\ bar" recently.
>
> In theory -b escapes anything that prevents the shell from seeing
> a file as a single argument. In PRACTICE bash's parsing logic here
> is sad enough it doesn't help:
>
>   $ mkdir sub && cd sub
>   $ touch "$(echo -e 'aa1  aa2')"
>   $ for i in $(ls -b); do echo $i; done
>   aa1\
>   \
>   aa2
>
>> testing a little bit, -b and -q seem to be wrong, and it seems like a
>> tty should imply -q rather than -b.
>
> I dunno about "wrong", but I agree it's not what ubuntu's does.
> Posix-2008 is ambivalent:
>
>   -q  Force each instance of non-printable filename characters and
>       <tab> characters to be written as the <question-mark> ( '?' )
>       character. Implementations may provide this option by default
>       if the output is to a terminal device.
>
> But posix doesn't mention -b at all, so adding that to the "may"
> pile doesn't contradict posix.
>
>> (`info ls` agrees about the
>> latter, but doesn't distinguish between the two demonstrably different
>> sets of "non-graphic" characters for -b and -q.)
>
> Info is the gnu proprietary documentation format nobody else has ever
> or will ever use. (Back when I was helping on doclifter they said they
> were going to convert all the info stuff to html, but then changed
> their minds because they _like_ having a proprietary format nobody
> else uses.) I think the only time I've ever consulted an info page was
> when I was writing busybox sed and got curious what all the various
> nonstandard sed extensions _were_. So like... 2004.

(if i'm referencing info pages, it's shorthand for "--help was
useless, man was useless, but they had hidden something where no-one
will ever see it...")

> I had my reasons for changing it (the default output should be an
> unambiguous representation that doesn't cause collisions), but can
> put it back if you guys have something real it's breaking or feel
> strongly enough about matching ubuntu.

it was reported to us as a bug, and certainly "doesn't behave like the
desktop" is a reasonable working definition of "bug" unless the
desktop is obviously wrong.

>> the attached patch includes tests and a fix, though it's not obvious
>> from the git history why you've gone back and forth here over time,
>
> I try to either blog or have a mailing list message about the logic of
> that sort of thing, which was it this time... Looks like:
>
> http://lists.landley.net/pipermail/toybox-landley.net/2017-April/008937.html
> http://lists.landley.net/pipermail/toybox-landley.net/2017-April/008938.html

a code comment and/or a checkin comment are more useful though. tests
better still.

(although this is another case where i don't know how to capture the
"when on a tty" part of "-q does the wrong thing for ' ' when on a
tty" without proper unit tests.)

> I.E. better utf8 support, and generally handling "touch 'yy1  yy2'"
> so it doesn't look like one file in -c mode, etc. The -b escaping of
> spaces at least makes touch "aaa  bbb" look like one file rather than
> two. (Doesn't mean it's right, but neither is -q...)
>
> And I still have a diff in my tree to remind me to finish implementing
> the test:
>
> $ git diff tests/ls.test
> diff --git a/tests/ls.test b/tests/ls.test
> index 56f7403..3cc30db 100755
> --- a/tests/ls.test
> +++ b/tests/ls.test
> @@ -50,5 +50,8 @@ rm -rf lstest/* && touch lstest/file1.txt && INODE=`stat -c %
>  testing "with -i" "$IN && ls -i 2>/dev/null; $OUT" "$INODE file1.txt\n" "" ""
>  unset INODE
>
> +# "$(echo -e "$(X=0;while [ $X -lt 255 ];do X=$(($X+1));[ $X -eq 47 ]&& contin
> +# echo -e '\\e' | sed 's/\\e/\\033/g'
> +
>  # Removing test dir for cleanup purpose
>  rm -rf lstest
>
>
> This is linked to the "I know why ubuntu's ls has two spaces between
> columns instead of one now" issue, and need to add a test for that. *
>
>> nor are there any existing tests to capture whatever it was. (the
>
> I try to have ubuntu and toybox both pass my tests, hard to do
> that when you're testing a place they diverge.

in the test runner i use for android, i check whether something is a
symlink to toybox or not. i only use it to print a "note: this isn't
the toybox tool", but you could use it to explicitly express the
different expectations, and make it clear when it's deliberate.

> And "default output
> should provide an unambiguous representation of filenames with no
> collisions" is hard to encapsulate in a test.

even just including representative examples (like ' ' and ^G) helps,
and gives you a handy place to link to any mailing list discussion.

anyway, yeah, i don't know... if we hadn't already shipped this in O
i'd be more strongly against. as it is, everyone's expectations are
now broken under some circumstances anyway. and there's definitely
something to be said for "default output should provide an unambiguous
representation of filenames with no collisions", though that ought to
be documented, especially because it's a deviation from tradition.

the -q ' ' behavior seems more questionable, but if your actual
argument is "-q is useless", why not remove it? current debian busybox
ls doesn't seem to have it, and having poked at it i'm really
struggling to see when anyone would _want_ it. and then your choice of
-b over -q becomes "obvious". and the documentation becomes just
another line in your usual "deviations from posix" section (that's at
least readable to folks working _on_ toybox).

> Rob.
>
> * It's because trailing combining characters can eat one space:
>
> $ cd ~/toybox
> $ mkdir sub
> $ cd sub
> $ touch "$(cat ../tests/files/utf8/test1.txt)"
> $ touch zzz
> $ ls



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.



More information about the Toybox mailing list