[Toybox] sort help text.

Rob Landley rob at landley.net
Fri Mar 15 23:17:12 PDT 2019


The last paragraph of the sort help txt explains -k except that we implement the
silly gnu "-k2.3,4.5n" extension (some build script somewhere probably used it),
but don't document it. (This help text is _so_ old it double spaces after
periods, which went away due to html never doing that and everybody retconning
their own memories to insist that this is how it's always been even though circa
1992 teachers marked you off for _not_ double spacing after a period.)

Anyway, the problem with documenting it is that the gnu behavior (which we
implemented) is stupid. If you don't -t then the leading separator (arbitrary
runs of whitespace) is included in the character count, but if you specify -t
the first character (which count from 1 just like fields do) is the next
character _after_ the separator.

And don't get me started on:

$ echo -e 'a b\na\tb' | sort -k2,2
a b
a	b
$ echo -e 'a\tb\na b' | sort -k2,2
a b
a	b
$ echo -e 'a\tc\na       b' | sort -k2,2
a       b
a	c
$ echo -e 'a\tb\na       c' | sort -k2,2
a	b
a       c

Which toybox sort isn't matching:

$ echo -e 'a b\na\tb' | ./sort -k2,2
a	b
a b
$ echo -e 'a\tb\na b' | ./sort -k2,2
a	b
a b
$ echo -e 'a\tc\na       b' | ./sort -k2,2
a	c
a       b
$ echo -e 'a\tb\na       c' | ./sort -k2,2
a	b
a       c

(Lemme guess: they _do_ strip the leading whitespace from key sorts even when
they _say_ they don't, and then they do a fallback whole-string sort as tie
breaker. So I need to change when I'm advancing past the leading space...)

This is why my todo list doesn't get shorter. I noticed this because I was
checking existing xstrdup() callers...

I am _HIGHLY_TEMPTED_ to make toybox -k2.3 start at the third character of the
key _always_ skipping the separator, since THAT'S WHAT THE OTHER ONE IS ACTUALLY
COMPARING under non-micromanaged circumstances. But that's not how those clowns
implemented .x. I can add the above tests to tests/sort.test but I kinda dowanna?

@@ -121,7 +122,8 @@ static char *get_key_data(char *str, struct sort_key *key, i
   if (TT.t && str[start]==*TT.t) start++;

   // Strip leading and trailing whitespace if necessary
-  if (flags&FLAG_b) while (isspace(str[start])) start++;
+  if ((flags&FLAG_b) || (!TT.t && !key->range[3]))
+    while (isspace(str[start])) start++;
   if (flags&FLAG_bb) while (end>start && isspace(str[end-1])) end--;

   // Handle offsets on start and end

That's just embarassing. It's the _compatible_ behavior, but is it the _right_
behavior?

Sigh. Nobody else has noticed this for years and years...

Rob



More information about the Toybox mailing list