[Toybox] tsort

Wed Oct 11 01:46:07 PDT 2023

On 10/10/23 17:46, Ray Gardner wrote:
> After seeing your comments about tsort in your blog, I thought about trying to
> implement Knuth's algorithm,

Is that the recommended algorithm here?

  $ grep -i knuth busybox/coreutils/tsort.c
  $ grep -ir knuth busybox
  busybox/archival/libarchive/bz/blocksort.c: * Knuth's increments seem to work
  better

Busybox mentions knuth once in the bzip2 compression code...

I had trouble finding a reasonable explanation of what the tsort command should
DO, let alone how it should be implemented. All the documentation was written by
people who knew how this command worked for 20 years before they wrote anything
about it, and who all have such deeply embedded assumptions that "it resolves
dependencies" is not mentioned in posix. The one and only occurence of "depend"
in the posix tsort page is the sentence "The output ordering is not
lexicographic, but depends on the pairs of items given as input" which LITERALLY
just means the output ordering depends on what the input was. Isn't that what
sorting MEANS? HOW DOES THIS HELP?

Meanwhile the gnu man page just says:

NAME
       tsort - perform topological sort

DESCRIPTION
       Write  totally  ordered  list  consistent  with the partial ordering in
       FILE.

Which if you don't know already what a "topological sort" is contains zero
information.

Wikipedia[citation needed] says "a topological sort of a directed graph is a
linear ordering of its vertices such that for every directed edge uv from vertex
u to vertex v, u comes before v in the ordering", which does not HELP here. I
don't HAVE A GRAPH. I have an array! Turning the array INTO a graph/tree would
be half the work!

Meanwhile the wikipedia page on the tsort _command_ is extra-useless:

  https://en.wikipedia.org/wiki/Tsort

The intro part says:

  The tsort program is a command line utility on Unix and Unix-like platforms,
  that performs a topological sort on its input. As of 2017, it is part of the
  POSIX.1 standard.[1]

Which literally tells me "tsort performs a topological sort" and NOTHING ELSE.
The 'history" and "syntax" sections are equally content-free, and then in
behavior it says "The output is a total ordering that corresponds to the given
partial ordering" and then starts talking about listing graph vertices again.

Hitting the ejector seat on the mathematicians: in plumbing-speak tsort is a
dependency resolver taking before/after pairs and giving you a list where each
"before" occurs earlier than the corresponding "after", with everything
mentioned exactly once. So it can tell you what order to run a pile of init
scripts in, what order to insmod modules in, what order to apt install packages
in... very useful.

But NOBODY SAID THAT. I had to FIGURE IT OUT. The first roadmap.html (commit
8f90d3aa019f from 2012) said "Some commands are for a compiler toolchain (ar c99
cflow ctags cxref gencat iconv lex m4 make nm strings strip tsort yacc), which
is outside of toybox's mandate and should be supplied externally." and nobody
ever corrected that assessment. Busybox didn't have this command until 2022.

Elliott wasn't familiar with it either when he pointed me at
https://austingroupbugs.net/view.php?id=1745 in September 5th because I'd
recently mentioned tsort, and our resulting email thread was:

On Tue, Sep 5, 2023 at 4:30 PM Rob Landley <rob at landley.net> wrote:
> >
> > On 9/5/23 14:36, enh wrote:
>> > > since you were talking about tsort recently... (but, yeah, afaik this
>> > > is only useful for ancient linkers. i don't think i've seen it used
>> > > since the 1990s?)
> >
> > Then why did busybox add it last year?
> >
> >   https://git.busybox.net/busybox/commit/?id=4642cf5b388b
> >
> > That's the question I'm head scratching about...
> >
> > Rob
>
> because it's small, easy (modulo the unspecified bits), and in posix?
>
> did it not come up on the mailing list?

And THAT was the intelligent question I _should_ have asked when pointed me at
http://lists.busybox.net/pipermail/busybox/2022-February/089483.html saying "I
was recently after tsort to do some init ordering on a small system. Hope this
patch is acceptable." And I went "it does what now?" and _that_ started me down
this rabbit hole.

But it wasn't until I read the busybox mailing list post where somebody said
they were using this for init dependency ordering that I had any idea what the
command was FOR.

And then digging to see which algorithm is best for implementing this dependency
ordering takes me back into math-world where everybody's drawing graphs to
explain stuff, and what I've GOT is an array of string pairs (which is neither a
graph nor a tree). And trying to get much more detail about anybody's
implementation means either reading GPLv2 busybox code or GPLv3 coreutils code,
neither of which I really want to spend long staring into before writing
something under a public domain equivalent license. Yes the wikipedia tsort page
mentions knuth (and whoever A.B. Kahn is) under "Further Reading" at the bottom
(not as links I could actually click on and read but as books behind paywalls).

(I might still be just a little bit cheesed about this. Normal I vent the above
sort of thing into the blog instead of here, but you asked. :)

But sure, if you've got an easily explicable better algorithm I'm all ears.

> but after I saw how far you'd got with your version
> I decided not to for now. I haven't analyzed your version carefully.

What I did seems to work, I understand how, and I expect to be able to fix it if
it breaks.

Also, what scalability do we _need_?

  $ find /lib/modules/$(uname -r) -name '*.ko' | wc -l
  3559

The use cases I'm aware of for this thing aren't _that_ big. Even N^2
scalability wouldn't be that bad for those, which I put extra effort in to try
to _avoid_, the whole dance of the otop/otop2 end indexes wasn't necessary if I
just looped out to len each time had I been willing to let it be N^2. But the
exclusion list only NEEDS to look at the previous two passes, and the reason it
needs to be _two_ passes is explained in the comment:

        // duplicate killer checks through previous TWO passes, because
        // "a b c a" removes "c a" pair after checking "a b" pair, so removing
        // "a b" next pass tries to output "a" again.

> It seems to
> be O(N^2) due to all the work to close up the list with each item removed,

Reading in the data in which is O(N) modulo the realloc() maybe getting
pathological, qsort() is kinda N(log n) in the common case, we ititerate over
the array doing a binary search on each entry which is N(log n) but how much
each pass consumes is kinda potlock, and then we do more or less the opposite of
an insertion sort pulling each entry we find back out which may be where you get
your N^2 but I don't have a better idea, although memmove() that _should_ be
pretty optimized on modern hardware it's almost certainly L2 cache local in any
real world use case you're going to throw at it (we're just copying the
contiguous POINTERS, not the string data randomly scattered in the heap that may
be cache cold and not even in an active DRAM bank). The duplicate killing
iteration part should only be N^2 for pathological inputs (just like qsort), and
when it IS one of the other N's basically goes away because you're consuming
LOTS of the list in one pass.

There are various pathological orderings but each makes some of the other
processing into NOPs for that pass. If you pull out exactly one entry per
iteration you're doing extra binary searches but not doing a lot of memmove or
dup checking. If you pull out EVERY entry on the first pass you do a lot of
memmove (and I had an optimization in mind where it iterates down from the end
instead of the beginning, so in that case each memmove is tiny, but I don't
think I bothered)... but then the array gets way smaller in one pass. The
expenses trade off.

I did try to think about this. I just did so in the context of processing an
array (which is the data representation I had), not a graph (which is a data
representation I didn't particularly see an advantage in trying to convert it to).

> on
> average? Not sure, but it measures roughly O(N^2).

How big an input have you fed it? Being N^2 over 10 elements is fine by me.

As for proper stress testing, the biggest input source I can think of would
probably be extracting the dependencies from debian's package repository:

  $ aptitude search . | wc -l
  74699
  $ aptitude show zypper | sed -n 's/Depends://p'
  libaugeas0 (>= 0.6.0), libc6 (>= 2.14), libgcc1 (>= 1:3.0),
  libreadline7 (>= 6.0), libstdc++6 (>= 5.2), libxml2 (>= 2.7.4), libzypp1702,
  zypper-common (>= 1.14.11-1), python

And feeding all the package names in as pairs. Modulo there's some processing to
strip out the version info and there are meta-packages and so on? I think using
aptitude on the distro's visible repository for the current install should
filter out architecturally unavailable packages, but it might still need
nontrivial processing. And the result is almost certainly GOING to have cycles
in it down in the base packages debootstrap installs. (Or else those packages
lie about their dependencies, or got munged together into a meta-package.)

Still, there's a dataset of 75k strings with dependency info, could be anywhere
up to a million A->B pairs. Probably worth a dig to see how tsort handles it.

> I looked at the code and also saw in your blog about the problem of creating a
> pointer to an element "below" the keep[] array.

It's not HARD, I just consider the compiler's behavior to be obviously wrong
here. My code never performs an illegal dereference. I don't see why the
compiler should care what pointer values I'm passing around when they're not
being dereferenced: in C, tracking that is MY job. Their warning generator cares
about things it shouldn't, and there's no override the way you can put
parentheses around an assignment to tell if() or for(;;) that yes, you really
meant to do that.

What I've generally done to rip the code out of a context where the warning
generator THINKS it understands what's going on is to typecast to long so it can
do the exact same math without attaching strange C++ religious taboos to it.
(Your virtual functions will stop working, you'll select the wrong template
expansion, and the exception handler's stack unwinding may misbehave! Unclean!
Unclean! I haven't got those. You can stop now.)

Sigh, we were just talking about what is and isn't an atomic access in devmem.
(Note that if you're doing unaligned access, it's not atomic. No I haven't added
a check for that yet.) If the C compiler is doing significantly different memory
accesses than you told it to, STUFF WILL BREAK. That's what C is for.

> Instead of doing the "usual LP64
> trick", had you considered using another array?

Yes, I considered declaring "char *sadness[3], **keep = sadness+1" but the
reason I _didn't_ is the compiler is WRONG.

> Something like this:
> 
> --- "..\\1005\\toys\\posix\\tsort.c" 2023-10-05 23:28:19.000000000 -0600
> +++ "toys\\posix\\tsort.c" 2023-10-06 10:14:01.448795400 -0600
> @@ -43,23 +43,13 @@
>  static void do_tsort(int fd, char *name)
>  {
>    off_t plen = 0;
> -  char *ss, **pair, *keep[2];
> +  char *ss, **pair, *keep[2], *keepfake[2];
>    long count,    // remaining unprocessed pairs
>         len,      // total strings in pair list
>         out,      // most recently added output (counts down from len)
>         otop,     // out at start of this loop over pair[]
>         otop2,    // out at start of previous loop over pair[]
>         ii, jj, kk;
> -  unsigned long sigh;
> -
> -  // bsearch()'s first argument is the element to search for,
> -  // and sbse() compares the second element of each pair, so to find
> -  // which FIRST element matches a second element, we need to feed bsearch
> -  // keep-1 but the compiler clutches its pearls about this even when
> -  // typecast to (void *), so do the usual LP64 trick to MAKE IT SHUT UP.
> -  // (The search function adds 1 to each argument so we never access
> -  // memory outside the pair.)
> -  sigh = ((unsigned long)keep)-sizeof(*keep);
>  
>    // Count input entries in data block read from fd
>    if (!(ss = readfd(fd, 0, &plen))) return;
> @@ -92,8 +82,12 @@
>      for (ii = 0; ii<count; ii++) {
>        // Find pair that depends on itself or no other pair depends on first str
>        memcpy(keep, pair+2*ii, sizeof(keep));
> -      // The compiler won't shut up about keep-1 no matter how we typecast it.
> -      if (keep[0]!=keep[1] && bsearch((void *)sigh, pair, count, sizeof(keep),
> +      // bsearch()'s first argument is the element to search for,
> +      // and sbse() compares the second element of each pair, so to find
> +      // which FIRST element matches a second element, we need to feed bsearch
> +      // an array with a second element == keep[0].
> +      keepfake[1] = keep[0];
> +      if (keep[0]!=keep[1] && bsearch(keepfake, pair, count, sizeof(keep),
>            (void *)sbse)) continue;
>  
>        // Remove from pair list
> 
> I think it's clearer,

My workaround changes 2 lines (and probably a 1 line comment like "sadness[0] is
never accessed, but passing keep-1 from keep[2] spuriously warns"), I was just
mad at the compiler not letting me (void *) or similar my way around the
warning. There should be a way to tell it "I believe I know what I'm doing here,
stop trying to drive". There is not, that I could find. I don't want to make
that lack LESS obvious in case such a way comes up in future.

If Elliott really really wants the blah[3], keep=blah+1 version I can do that.
But the compiler would still be wrong, and I don't volunteer danegeld. (The
compiler keeps threatening me with a good time. "Warning: yes, I want that to
happen.")

> I believe it's a little shorter in code size, and the run
> time difference is negligible.
> 
> I can post here about how I tested, if you're interested.

How you tested what? That it's shorter? Last I checked make baseline/bloatcheck
still work...

(The one I added to busybox in 2006, commits f8a808426745 and d9872aa0d702, was
an external project written in python. The toybox one from commit 7a4551ff5eec
is a simple bash wrapper around nm --size-sort with diff and sort in there...
Wow, it's been a while since I wrote that file isn't it? Ok, couple quick tweaks...)

Rob