[Toybox] [PATCH] ls: Ensure file names are separated by 2 spaces

enh enh at google.com
Fri Oct 18 14:33:47 PDT 2019


(if you check in a failing test, i can't update Android's copy of
toybox, so i'm strongly in favor of any other order of commits :-) )

On Fri, Oct 18, 2019 at 1:04 PM Andrew Ilijic <ilijic.andrew at gmail.com> wrote:
>
> Hi Enh,
>
> (Below are the new tests and the failure output)
> I created 2 test cases to do a basic test of `-x` and `-C`. These
> tests pass for the host `ls` but not for Toybox's `ls`. You cant see
> the issue in `diff` but I recreated the test and `hexdump -C
> <test_file>` shows a trailing 1 - 2 spaces, depending on if you are on
> `master` or my patch. I need to modify my patch so that it does not
> add the 2 trailing spaces.
>
> How should I submit these changes, all at once or multiple patches?
> These changes being the two new tests and the changes to `ls.c` to
> bring it into conformance.
>
> Test Case: ls.test
> testing "no argument" "$IN && ls; $OUT"
> "dir1\ndir2\nfile1.txt\nfile2.txt\n" "" ""
> testing "with -C: test column spacing equals 2" "$IN && ls -C; $OUT"
> "dir1  dir2  file1.txt  file2.txt\n" "" ""
> testing "with -x: test column spacing equals 2" "$IN && ls -x; $OUT"
> "dir1  dir2  file1.txt  file2.txt\n" "" ""
> testing "with wild char" "$IN && ls file*; $OUT" "file1.txt\nfile2.txt\n" "" ""
>
> Test Output:
> PASS: ls no argument
> FAIL: ls with -C: test column spacing equals 2
> echo -ne '' | cd lstest && ls -C; cd ..
> --- expected    2019-10-18 15:40:36.158000000 -0400
> +++ actual      2019-10-18 15:40:36.159000000 -0400
> @@ -1 +1 @@
> -dir1  dir2  file1.txt  file2.txt
> +dir1  dir2  file1.txt  file2.txt
> FAIL: ls with -x: test column spacing equals 2
> echo -ne '' | cd lstest && ls -x; cd ..
> --- expected    2019-10-18 15:40:36.163000000 -0400
> +++ actual      2019-10-18 15:40:36.165000000 -0400
> @@ -1 +1 @@
> -dir1  dir2  file1.txt  file2.txt
> +dir1  dir2  file1.txt  file2.txt
> PASS: ls with wild char
>
>
> ~Andrew
>
> On Fri, Oct 18, 2019 at 1:39 PM Andrew Ilijic <ilijic.andrew at gmail.com> wrote:
> >
> > Hi Rob,
> >
> > After the third reading of your email, I see that we were saying the same thing.
> > >
> > > What _is_ a good test is "-C is utf8 fontmetrics, not a byte count". So if you do...
> > >
> > I will look into fontmetrics and a modified `wc` to test the combining
> > characters.
> >
> > ~Andrew
> >
> > On Fri, Oct 18, 2019 at 12:02 PM Andrew Ilijic <ilijic.andrew at gmail.com> wrote:
> > >
> > > Thank you Rob,
> > >
> > > I will add one or more tests to make sure that the two spaces are
> > > present. For the first test or two, I plan to use ASCII characters.
> > > The rationale being that if someone decides to compile the program on
> > > an old system or an embedded system with a minimal operating system
> > > the tests will work, but we include <locale.h> so that argument is
> > > probably moot.
> > >
> > > My first thought is to have either the program or the test detect its
> > > context. Then adjust or reinterpret the output depending on whether or
> > > not we expect the combining characters to eat the space. I looked into
> > > how other systems test this sort of thing. They take the input, output
> > > SVG, and then check that it matches the expected SVG.  Do any chance
> > > terminal emulators have a canonical representation for glyphs that we
> > > can examine?
> > >
> > > For the tests involving the combining characters, it seems like we are
> > > testing the terminal emulator UTF-8 compatibility more than the `ls`
> > > program. Rob, can you give an example of when a filename would have
> > > capturing characters at the end of it? The only thing I can think of
> > > is a language where, if two words are next to each other, they should
> > > modify each other. That seems to violate my idea of what a filename
> > > is.  Don't get me wrong, reading up on this issue and those that
> > > surround it is fascinating, and it is something that I will keep at
> > > the top of my mind.
> > >
> > > Rob, thank you for all of the work items you suggested. I am
> > > interested in working on all of them. I want to get my first patch
> > > accepted and merged and go from there. I don't want to commit to a
> > > bunch of things and then be unable to deliver.
> > >
> > > ~Andrew
> > >
> > > On Thu, Oct 17, 2019 at 5:22 PM Rob Landley <rob at landley.net> wrote:
> > > >
> > > > I have various torture test utf8 in test/files/utf8. To see a trailing combining
> > > > character eat a following space, you can do:
> > > >
> > > > $ head -c 5 tests/files/utf8/test3.txt; echo ' '
> > > >
> > > > Or
> > > >
> > > > $ head -c 5 tests/files/utf8/test1.txt; echo 'X'
> > > >
> > > > The problem is, it's easy to see visually, but I'm not quite sure how an
> > > > automated test is expected to care? (Other than going "it produced the right
> > > > output"...
> > > >
> > > > The OTHER fun thing is that -C and -x care about terminal size, or default to 80
> > > > columns when they can't determine it. In theory there's an ls -w option that
> > > > lets you set the width, but I never needed/implemented it. (Andrew: you're
> > > > welcome to if you like. I could walk you through it or you could read
> > > > http://landley.net/toybox/code.html#lib_args and try to work it out yourself.)
> > > >
> > > > So if you did want to test this, you'd want to "touch" a test file with a funky
> > > > utf8 name, make another test file with a long enough name to force minimal
> > > > spacing, and then... do what? You'd just be proving they did or didn't fit on
> > > > the same line. (Basically reproducing known/expected output. If they had one
> > > > space between, these two would fit, with a minimum of two spaces, they'll wrap.)
> > > >
> > > > What _is_ a good test is "-C is utf8 fontmetrics, not a byte count". So if you do:
> > > >
> > > > $ mkdir sub
> > > > $ cd sub
> > > > $ touch $(cat ../tests/files/utf8/test3.txt)
> > > > $ toybox ls
> > > >
> > > > The result fits on one line, when the file was 198 bytes long. (Note: if you
> > > > don't put quotes around the "$()" it'll split it into 5 words because whitespace
> > > > in the sentence.
> > > >
> > > > That said:
> > > > $ wc -m ../tests/files/utf8/test3.txt
> > > > 110 ../tests/files/utf8/test3.txt
> > > >
> > > > And I _visually_ count that as 22 characters. (That's the debian wc, not toybox
> > > > saying that. Toybox matches debian here, but it seems conceptually wrong. It's
> > > > like we want another option measuring combined characters. Maybe -M or something?)
> > > >
> > > > But testing how your terminal displays the result for "ls bridged the space or
> > > > they're still visually distinct"...? I have no idea how to automate short of
> > > > doing a screen capture in a known font and comparing the bitmap, which isn't
> > > > remotely realistic. :)
> > > >
> > > > (There are, sadly, a lot of tests I know how to _perform_ but not _automate_.
> > > > And the pending mkroot stuff merely _reduces_ that gap, not eliminate it.)
> > > >
> > > > Rob
> > > >
> > > > On 10/17/19 10:44 AM, enh via Toybox wrote:
> > > > > new tests? (including the difficult combining-character case?)
> > > > >
> > > > > On Thu, Oct 17, 2019 at 8:35 AM Andrew Ilijic <ilijic.andrew at gmail.com> wrote:
> > > > >>
> > > > >> From d4c03b672d8b18f7ce021e4fcb02e8bb86f38f5f Mon Sep 17 00:00:00 2001
> > > > >> From: Andrew Ilijic <ilijic.andrew at gmail.com>
> > > > >> Date: Thu, 17 Oct 2019 11:03:19 -0400
> > > > >> Subject: [PATCH] ls: Ensure file names are separated by 2 spaces
> > > > >>
> > > > >> We need two spaces between filenames because that is the convention
> > > > >> followed by other implementations. More importantly, if we do not have
> > > > >> two spaces, certain Unicode file names cause filenames to run together.
> > > > >> In Unicode, combining characters come before the character they modify.
> > > > >> If a filename ends in a combining character, the combining character
> > > > >> attaches to the space that follows it, causing the space not to be
> > > > >> visible. Having a two-space gap stops the above issue from happening.
> > > > >>
> > > > >> For context and a bit more information, see mailing list link below.
> > > > >> https://www.mail-archive.com/toybox@lists.landley.net/msg05986.html
> > > > >> ---
> > > > >>  toys/posix/ls.c | 3 ++-
> > > > >>  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > >>
> > > > >> diff --git a/toys/posix/ls.c b/toys/posix/ls.c
> > > > >> index 0956902f..d4c0211a 100644
> > > > >> --- a/toys/posix/ls.c
> > > > >> +++ b/toys/posix/ls.c
> > > > >> @@ -384,7 +384,8 @@ static void listfiles(int dirfd, struct dirtree *indir)
> > > > >>        memset(colsizes, 0, columns*sizeof(unsigned));
> > > > >>        for (ul=0; ul<dtlen; ul++) {
> > > > >>          entrylen(sort[next_column(ul, dtlen, columns, &c)], len);
> > > > >> -        *len += totpad+1;
> > > > >> +        // Add `2` to `totpad` to ensure two spaces between filenames
> > > > >> +        *len += totpad+2;
> > > > >>          if (c == columns) break;
> > > > >>          // Expand this column if necessary, break if that puts us over budget
> > > > >>          if (*len > colsizes[c]) {
> > > > >> --
> > > > >> 2.11.0
> > > > >> _______________________________________________
> > > > >> Toybox mailing list
> > > > >> Toybox at lists.landley.net
> > > > >> http://lists.landley.net/listinfo.cgi/toybox-landley.net
> > > > > _______________________________________________
> > > > > Toybox mailing list
> > > > > Toybox at lists.landley.net
> > > > > http://lists.landley.net/listinfo.cgi/toybox-landley.net
> > > > >



More information about the Toybox mailing list