[Toybox] [PATCH] tr.c: added -t option and cleanup up formatting

Rob Landley rob at landley.net
Sat Oct 21 03:04:49 PDT 2023


On 10/20/23 23:22, Oliver Webb via Toybox wrote:
> Heya, I noticed that tr was in pending, taking a look at the source code.

Yeah, it's one of the big remaining todo items to get Linux From Scratch
building, I was looking at it briefly last week...

> It doesn't look very unclean, nor does it fail any test cases.

I have a redesign to make it handle utf-8 encoded unicode, both in the input and
in the patterns. Took me forever to work out how, but I _think_ I understand it
now? Just haven't done it yet.

Well, I think I've figured out how to handle unicode (with combining characters)
and the [:class:] specifiers. Still don't understand what [=CHAR=] equivalency
classes mean, exactly, other than "strip combining characters"? Except there's a
lot of À Á Â Ã in the base set that... the man page says that equivalence
classes are defined by LC_COLLATE but everybody seems to punt on the specifics.
(Or maybe this is just a symptom of Google having a harder time finding stuff
these days? Section 3.1.3.6 of http://unicode.org/L2/L2001/01487-14652w25.pdf is
not very illuminating.)

Anyway, hadn't dug into that part yet. Vaguely planning to punt and wait for a
complaint, because the OTHER thing that comes up a lot when you search for this
is "it doesn't work". Although I am highly amused by the database error at:

https://www.unix.com/shell-programming-and-scripting/283373-equivalence-classes-dont-work.html

Which is saying that the page talking about how equivalence classes don't work
itself does not work.

This guy went into detail, but I have not opened that particular can of worms yet:

http://databasearchitects.blogspot.com/2016/08/equivalence-of-unicode-strings-is.html

> The only 2 things
> in the TODO are -t and -a. Neither POSIX or GNU tr specify a -a[scii] option.
> The name gives a general idea of what it's supposed to do 
> (Stop acting utf-8 safe and treat everything as extended ASCII?)

It's a note-to-self that there should probably be a way to disable the unicode
support I haven't added yet, and that -a isn't currently used anywhere I could find.

> I added in a -t[runcate] option and a corresponding test case.
> 
> I also cleaned up some of code (foobar[0] to *foobar, removing sizeof(char), etc)

Applied, and I did a little more cleanup while I was there.

Rob


More information about the Toybox mailing list