[Toybox] [PATCH] ls: Ensure file names are separated by 2 spaces

Rob Landley rob at landley.net
Thu Oct 17 14:24:06 PDT 2019


I have various torture test utf8 in test/files/utf8. To see a trailing combining
character eat a following space, you can do:

$ head -c 5 tests/files/utf8/test3.txt; echo ' '

Or

$ head -c 5 tests/files/utf8/test1.txt; echo 'X'

The problem is, it's easy to see visually, but I'm not quite sure how an
automated test is expected to care? (Other than going "it produced the right
output"...

The OTHER fun thing is that -C and -x care about terminal size, or default to 80
columns when they can't determine it. In theory there's an ls -w option that
lets you set the width, but I never needed/implemented it. (Andrew: you're
welcome to if you like. I could walk you through it or you could read
http://landley.net/toybox/code.html#lib_args and try to work it out yourself.)

So if you did want to test this, you'd want to "touch" a test file with a funky
utf8 name, make another test file with a long enough name to force minimal
spacing, and then... do what? You'd just be proving they did or didn't fit on
the same line. (Basically reproducing known/expected output. If they had one
space between, these two would fit, with a minimum of two spaces, they'll wrap.)

What _is_ a good test is "-C is utf8 fontmetrics, not a byte count". So if you do:

$ mkdir sub
$ cd sub
$ touch $(cat ../tests/files/utf8/test3.txt)
$ toybox ls

The result fits on one line, when the file was 198 bytes long. (Note: if you
don't put quotes around the "$()" it'll split it into 5 words because whitespace
in the sentence.

That said:
$ wc -m ../tests/files/utf8/test3.txt
110 ../tests/files/utf8/test3.txt

And I _visually_ count that as 22 characters. (That's the debian wc, not toybox
saying that. Toybox matches debian here, but it seems conceptually wrong. It's
like we want another option measuring combined characters. Maybe -M or something?)

But testing how your terminal displays the result for "ls bridged the space or
they're still visually distinct"...? I have no idea how to automate short of
doing a screen capture in a known font and comparing the bitmap, which isn't
remotely realistic. :)

(There are, sadly, a lot of tests I know how to _perform_ but not _automate_.
And the pending mkroot stuff merely _reduces_ that gap, not eliminate it.)

Rob

On 10/17/19 10:44 AM, enh via Toybox wrote:
> new tests? (including the difficult combining-character case?)
> 
> On Thu, Oct 17, 2019 at 8:35 AM Andrew Ilijic <ilijic.andrew at gmail.com> wrote:
>>
>> From d4c03b672d8b18f7ce021e4fcb02e8bb86f38f5f Mon Sep 17 00:00:00 2001
>> From: Andrew Ilijic <ilijic.andrew at gmail.com>
>> Date: Thu, 17 Oct 2019 11:03:19 -0400
>> Subject: [PATCH] ls: Ensure file names are separated by 2 spaces
>>
>> We need two spaces between filenames because that is the convention
>> followed by other implementations. More importantly, if we do not have
>> two spaces, certain Unicode file names cause filenames to run together.
>> In Unicode, combining characters come before the character they modify.
>> If a filename ends in a combining character, the combining character
>> attaches to the space that follows it, causing the space not to be
>> visible. Having a two-space gap stops the above issue from happening.
>>
>> For context and a bit more information, see mailing list link below.
>> https://www.mail-archive.com/toybox@lists.landley.net/msg05986.html
>> ---
>>  toys/posix/ls.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/toys/posix/ls.c b/toys/posix/ls.c
>> index 0956902f..d4c0211a 100644
>> --- a/toys/posix/ls.c
>> +++ b/toys/posix/ls.c
>> @@ -384,7 +384,8 @@ static void listfiles(int dirfd, struct dirtree *indir)
>>        memset(colsizes, 0, columns*sizeof(unsigned));
>>        for (ul=0; ul<dtlen; ul++) {
>>          entrylen(sort[next_column(ul, dtlen, columns, &c)], len);
>> -        *len += totpad+1;
>> +        // Add `2` to `totpad` to ensure two spaces between filenames
>> +        *len += totpad+2;
>>          if (c == columns) break;
>>          // Expand this column if necessary, break if that puts us over budget
>>          if (*len > colsizes[c]) {
>> --
>> 2.11.0
>> _______________________________________________
>> Toybox mailing list
>> Toybox at lists.landley.net
>> http://lists.landley.net/listinfo.cgi/toybox-landley.net
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
> 



More information about the Toybox mailing list