[Toybox] [PATCH] ls new option : b

Rob Landley rob at landley.net
Thu Mar 10 22:57:05 PST 2016


On 03/09/2016 10:40 PM, Sameer Pradhan wrote:
> Thanks for your comment Isaac.
> 
> I have modified the patch by addressing to your comments.
> Please find the modified patch as attachment.

I'm already poking at ls to fix the date format posix thing mentioned
earlier, but assuming this patch supercedes your first attempt, let's
take a look...

You're not modifying strwidth, and ls with no arguments acts like ls -C
when we have a tty, so "ls -b" is going to get column sizes wrong.

The heart of this patch is just 3 lines. I have more comments than lines.

+        for (b = sort[next]->name; *b; b++)
+            if (isgraph(*b)) fputc(*b, stdout);
+         else printf("\\%3hho", *b);

1) You need to say %03 or you'll have "\ 52" with a space in it.

2) You're not escaping \ so you can't distinguish between a file called "\033"
and a file containing an escape character.

3) What does the "hh" accomplish exactly?

In C99, varargs promotes anything shorter than int to int, so that before
the rise of 64 bit systems all your arguments were the same size on the stack.
(On 64 bit systems it _didn't_ expand that to "long" because it didn't want
to waste stack space, and thus the horrible need to typecast (void *)0 in
varargs and keep int/long passing straight with %d vs %ld vs %lld...)

Anyway, my point is %o should work fine, is there a reason for the hh here?

4) No UTF-8 support? I tested ls -b on a directory with japanese and arabic
text and the ubuntu one didn't escape those.

You haven't really defined what "unprintable" means with regard to UTF8.
Are combining characters printable? (They're zero length, but they do
stuff.) How about the direction-switching sequences?

Time to run some tests! I have various utf8 sample files checked in at
tests/files/utf8 and you can touch "$(cat filename)" to create
files with these names. (Do it in a subdirectory, deleting them can
be a pain otherwise.)

Let's see, 0xabad1dea.txt is based on @0xabad1dea's old twitter username,
which used direction reversing characters although xfce's terminal program
doesn't seem to honor them. Anyway, they went through fine although the
spaces beforeand after became "\ " with -b.

Japanese text: displayed same with -b and without -b.

test1.txt is a combining character abuse test (another cut and paste from
twitter) and that went through fine although it seems to very
slightly confuse ls's column measurements. But it does so the same with
and without -b.

bad.txt is three different types of invalid character sequences
(low ascii, invalid utf8 parse, and unused unicode point), and THAT
finally gets us some results from -b. Without it, ubuntu's ls prints
three "???", with it I get "\001\301\357\277\277".

So yeah, -b needs to be utf8 aware. As does -q more than I've done so
far. But octal escapes for everything isn't the ubuntu ls behavior,
and in the absence of a standard...

Meanwhile, looking at the existing ls code, the help text wasn't consistent.
It was using tab indents in the first column and two space indents on the
right. I switched it all to just two space indents, which let me cut the gap
between the two columns.

(It's still more text overall, at some point I want to do the "gzip the
help text and have show_help() zcat it and chop out the right bit
to display" trick, but that's on the todo list. I need to go back and
finish deflate compression-side, and make decompression not suck speed-wise.)

While looking at this, I also noticed that strwidth() and the actual
printing logic don't match when CFG_TOYBOX_I18N is enabled. In strwidth()
utf8 conversion failures are also turned into ? but in printing only
isprint() failures get converted. (And then there's "it converted to
an unknown utf8 codepoint"...) Needless to say, posix covers NONE of this
(that I've noticed).

A big tension in ls is that lib/linestack.c has its own utf8 escaping logic
in chunch_str(), but posix ls specifies utf8-ignorant -q logic which
destructively replaces things instead of escaping them in a recoverable way.
(Modulo ^ and <> and U+ not being escaped, so you can have false positives.
I get around that with ansi escapes to reverse video them, which is a hack.
I'm open to suggestions...)

If you explicitly want something that doesn't care about utf8 and just does
a poor man's uuencode to convert this into an unambiguously recoverable
format, you need to turn '\' into an octal escape too. (Otherwise you get
false positives if somebody actually calls a file "\033" with a literal
backslash and three digits.)

Anyway, I was poking at ls already because I need to fix the date format
posix compliance thing. I'll take a stab at fixing this up too while
I'm here.

Rob

 1457679425.0


More information about the Toybox mailing list