[Toybox] [PATCH] Fix ls sorting by name.

Rob Landley rob at landley.net
Mon Aug 17 11:21:13 PDT 2015


Copying this thread back to the list since I stupidly forgot to cc: it
last time. (Oops.)

On 08/16/2015 01:34 PM, enh wrote:
> On Sun, Aug 16, 2015 at 10:47 AM, Rob Landley <rob at landley.net> wrote:
>> On 08/15/2015 05:46 PM, enh wrote:
>>> Fix ls sorting by name.
>>>
>>> POSIX says ls sorts according to the collating sequence in the current
>>> locale. In the en-US.UTF-8 locale, coreutils ls sorts "Config.in" and
>>> "configure" before "LICENSE"; without this patch toybox ls sorts
>>> capitals first.
>>
>> Is this an important one?
> 
> i don't think so. certainly toolbox ls just uses strcmp, so it won't
> be seen as a regression when we switch over.

Oh good.

It's possible at some point we'll need to do this, but it's an unbounded
amount of work. (It's important that presentation layer stuff gets this
right, but in command line tools I value consistent behavior and smaller
attack surface more highly.)

>> I really don't want to open the can of worms of locale support if
>> there's an option not to. I would much prefer to document this as a
>> deviation from posix.
>>
>> Sorry, I've mentioned this before on the list and such:
>>
>> http://lists.landley.net/pipermail/toybox-landley.net/2012-November/002411.html
>>
>> But I guess I wasn't explicit enough. When I said in design.html that
>> "Locale support isn't currently a goal" but that utf8 support is, I
>> meant that preserving utf8 for the presentation layer (x11 and similar)
>> is important, but "sort" breaking your build environment because you
>> forgot to specify 'C' is annoying. (The first time it happened to me was
>> when I upgraded ubuntu and suddenly sorts were case sensitive when they
>> never had been before.
> 
> this was the motivating case (no pun intended). i suspect those of us
> who expect capitals to sort first because they did in the 70s and 80s
> are slowly dying out. right now i feel like we're in the transition
> period where both sides are unpleasantly surprised some of the time.

Indeed, I'd prefer the capitals shuffle in myself. But that's what ascii
sort does, and I don't know of another way to make it _consistent_ with
easily explained rules. (Modulo "forcing a specific locale for
everybody", which is just failure to implement locale support by another
name...)

> (as an aside, i don't think the build system argument is a good one.
> any decent build system should be sanitizing the environment to be as
> hermetic as possible, because there are just so many relevant
> environment variables and you can't fight them all.

Indeed:
http://landley.net/hg/aboriginal/file/1781/sources/variables.sh

> or know which ls
> you're actually running unless you include a prebuilt and call it
> explicitly,

Indeed:
https://speakerdeck.com/landley/developing-for-non-x86-targets-using-qemu?slide=98
(and the next three slides)

(And the rationale behind shipping the system-image tarballs and the
cross-compiler.sh as opposed to simple-cross-compiler.sh...)

> and that doesn't work in cases like running the toybox
> tests against coreutils.

Which is why I need a http://landley.net/aboriginal/control-images/ that
runs the toybox test suite in a known controlled environment under qemu.
I can make an image for each supported libc (uclibc, musl, and
eventually bionic: glibc can go hang although initial development is
done on an x86-64 glibc box so...) and then run each image against each
system-image-$TARGET for the dozen or so supported hardware platforms.

Alas, making that work is one of those "I disappear for 3 weeks" things,
which I haven't had time for recently.

> the Android build system has this problem
> with GREP_OPTIONS. rather than fight the current bad workaround i'm
> just going to worry about doing something more sensible in the new
> build system.)

Let me know when the new build system can be exposed to outside air, I
wanna poke at it. (I intermittently poke at the current AOSP one but
it's one of those things where just finding the edges is a long journey.)

I did http://landley.net/aboriginal/about.html
and my follow-up projects for that (after
http://landley.net/aboriginal/about.html#migrate) are
http://landley.net/aboriginal/about.html#hairball and
http://landley.net/aboriginal/about.html#selfhost and making AOSP build
natively under aboriginal would satisfy BOTH of the last two.

Alas, that's probably a couple years of full-time work _after_ the
toybox 1.0 release. (And getting the toolchain switched from gcc->llvm
and binutils->lld.llvm.org and uclibc++ to libcxx.llvm.org and details
like compiler-rt.llvm.org...)

Anyway, I agree that builds _should_ sanitize themselves against
environments they were never tested in. But having done the work to
create a properly isolated build environment in Aboriginal Linux via
_years_ of whack-a-mole (SEVEN YEARS of removing perl from the linux
kernel build, the Fedra upgrade that stopped installing libc.a by
default so you can build dynamic but _not_ static binaries using the
host toolchain, upstream packages switching to xz format tarballs,
ubuntu 10.04 turning "gcc" into a perl script, years of people reporting
bugs that only reproduce on specific sles and gentoo versions plus
whatever "pclinuxos" was but I got it running under kvm long enough to
reproduce the problem...)

AOSP only supporting a specific ubuntu version on a single architcture
as its designated build environment is something I feel your pain on.
(When I have a "not going there" moment it's generally the scars talking.)

Testing for the unknown is hard. Regression testing against the unknown
is worse.

Rob

 1439835673.0


More information about the Toybox mailing list