[Toybox] [PATCH] ls: Ensure file names are separated by 2 spaces

Andrew Ilijic ilijic.andrew at gmail.com
Fri Oct 18 09:02:40 PDT 2019


Thank you Rob,

I will add one or more tests to make sure that the two spaces are
present. For the first test or two, I plan to use ASCII characters.
The rationale being that if someone decides to compile the program on
an old system or an embedded system with a minimal operating system
the tests will work, but we include <locale.h> so that argument is
probably moot.

My first thought is to have either the program or the test detect its
context. Then adjust or reinterpret the output depending on whether or
not we expect the combining characters to eat the space. I looked into
how other systems test this sort of thing. They take the input, output
SVG, and then check that it matches the expected SVG.  Do any chance
terminal emulators have a canonical representation for glyphs that we
can examine?

For the tests involving the combining characters, it seems like we are
testing the terminal emulator UTF-8 compatibility more than the `ls`
program. Rob, can you give an example of when a filename would have
capturing characters at the end of it? The only thing I can think of
is a language where, if two words are next to each other, they should
modify each other. That seems to violate my idea of what a filename
is.  Don't get me wrong, reading up on this issue and those that
surround it is fascinating, and it is something that I will keep at
the top of my mind.

Rob, thank you for all of the work items you suggested. I am
interested in working on all of them. I want to get my first patch
accepted and merged and go from there. I don't want to commit to a
bunch of things and then be unable to deliver.

~Andrew

On Thu, Oct 17, 2019 at 5:22 PM Rob Landley <rob at landley.net> wrote:
>
> I have various torture test utf8 in test/files/utf8. To see a trailing combining
> character eat a following space, you can do:
>
> $ head -c 5 tests/files/utf8/test3.txt; echo ' '
>
> Or
>
> $ head -c 5 tests/files/utf8/test1.txt; echo 'X'
>
> The problem is, it's easy to see visually, but I'm not quite sure how an
> automated test is expected to care? (Other than going "it produced the right
> output"...
>
> The OTHER fun thing is that -C and -x care about terminal size, or default to 80
> columns when they can't determine it. In theory there's an ls -w option that
> lets you set the width, but I never needed/implemented it. (Andrew: you're
> welcome to if you like. I could walk you through it or you could read
> http://landley.net/toybox/code.html#lib_args and try to work it out yourself.)
>
> So if you did want to test this, you'd want to "touch" a test file with a funky
> utf8 name, make another test file with a long enough name to force minimal
> spacing, and then... do what? You'd just be proving they did or didn't fit on
> the same line. (Basically reproducing known/expected output. If they had one
> space between, these two would fit, with a minimum of two spaces, they'll wrap.)
>
> What _is_ a good test is "-C is utf8 fontmetrics, not a byte count". So if you do:
>
> $ mkdir sub
> $ cd sub
> $ touch $(cat ../tests/files/utf8/test3.txt)
> $ toybox ls
>
> The result fits on one line, when the file was 198 bytes long. (Note: if you
> don't put quotes around the "$()" it'll split it into 5 words because whitespace
> in the sentence.
>
> That said:
> $ wc -m ../tests/files/utf8/test3.txt
> 110 ../tests/files/utf8/test3.txt
>
> And I _visually_ count that as 22 characters. (That's the debian wc, not toybox
> saying that. Toybox matches debian here, but it seems conceptually wrong. It's
> like we want another option measuring combined characters. Maybe -M or something?)
>
> But testing how your terminal displays the result for "ls bridged the space or
> they're still visually distinct"...? I have no idea how to automate short of
> doing a screen capture in a known font and comparing the bitmap, which isn't
> remotely realistic. :)
>
> (There are, sadly, a lot of tests I know how to _perform_ but not _automate_.
> And the pending mkroot stuff merely _reduces_ that gap, not eliminate it.)
>
> Rob
>
> On 10/17/19 10:44 AM, enh via Toybox wrote:
> > new tests? (including the difficult combining-character case?)
> >
> > On Thu, Oct 17, 2019 at 8:35 AM Andrew Ilijic <ilijic.andrew at gmail.com> wrote:
> >>
> >> From d4c03b672d8b18f7ce021e4fcb02e8bb86f38f5f Mon Sep 17 00:00:00 2001
> >> From: Andrew Ilijic <ilijic.andrew at gmail.com>
> >> Date: Thu, 17 Oct 2019 11:03:19 -0400
> >> Subject: [PATCH] ls: Ensure file names are separated by 2 spaces
> >>
> >> We need two spaces between filenames because that is the convention
> >> followed by other implementations. More importantly, if we do not have
> >> two spaces, certain Unicode file names cause filenames to run together.
> >> In Unicode, combining characters come before the character they modify.
> >> If a filename ends in a combining character, the combining character
> >> attaches to the space that follows it, causing the space not to be
> >> visible. Having a two-space gap stops the above issue from happening.
> >>
> >> For context and a bit more information, see mailing list link below.
> >> https://www.mail-archive.com/toybox@lists.landley.net/msg05986.html
> >> ---
> >>  toys/posix/ls.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/toys/posix/ls.c b/toys/posix/ls.c
> >> index 0956902f..d4c0211a 100644
> >> --- a/toys/posix/ls.c
> >> +++ b/toys/posix/ls.c
> >> @@ -384,7 +384,8 @@ static void listfiles(int dirfd, struct dirtree *indir)
> >>        memset(colsizes, 0, columns*sizeof(unsigned));
> >>        for (ul=0; ul<dtlen; ul++) {
> >>          entrylen(sort[next_column(ul, dtlen, columns, &c)], len);
> >> -        *len += totpad+1;
> >> +        // Add `2` to `totpad` to ensure two spaces between filenames
> >> +        *len += totpad+2;
> >>          if (c == columns) break;
> >>          // Expand this column if necessary, break if that puts us over budget
> >>          if (*len > colsizes[c]) {
> >> --
> >> 2.11.0
> >> _______________________________________________
> >> Toybox mailing list
> >> Toybox at lists.landley.net
> >> http://lists.landley.net/listinfo.cgi/toybox-landley.net
> > _______________________________________________
> > Toybox mailing list
> > Toybox at lists.landley.net
> > http://lists.landley.net/listinfo.cgi/toybox-landley.net
> >



More information about the Toybox mailing list