[Toybox] tsort

enh enh at google.com
Wed Oct 11 09:07:49 PDT 2023


On Wed, Oct 11, 2023 at 1:42 AM Rob Landley <rob at landley.net> wrote:
>
> On 10/10/23 17:46, Ray Gardner wrote:
> > After seeing your comments about tsort in your blog, I thought about trying to
> > implement Knuth's algorithm,
>
> Is that the recommended algorithm here?
>
>   $ grep -i knuth busybox/coreutils/tsort.c
>   $ grep -ir knuth busybox
>   busybox/archival/libarchive/bz/blocksort.c: * Knuth's increments seem to work
>   better
>
> Busybox mentions knuth once in the bzip2 compression code...
>
> I had trouble finding a reasonable explanation of what the tsort command should
> DO, let alone how it should be implemented. All the documentation was written by
> people who knew how this command worked for 20 years before they wrote anything
> about it, and who all have such deeply embedded assumptions that "it resolves
> dependencies" is not mentioned in posix. The one and only occurence of "depend"
> in the posix tsort page is the sentence "The output ordering is not
> lexicographic, but depends on the pairs of items given as input" which LITERALLY
> just means the output ordering depends on what the input was. Isn't that what
> sorting MEANS? HOW DOES THIS HELP?
>
> Meanwhile the gnu man page just says:
>
> NAME
>        tsort - perform topological sort
>
> DESCRIPTION
>        Write  totally  ordered  list  consistent  with the partial ordering in
>        FILE.
>
> Which if you don't know already what a "topological sort" is contains zero
> information.
>
> Wikipedia[citation needed] says "a topological sort of a directed graph is a
> linear ordering of its vertices such that for every directed edge uv from vertex
> u to vertex v, u comes before v in the ordering", which does not HELP here. I
> don't HAVE A GRAPH. I have an array! Turning the array INTO a graph/tree would
> be half the work!
>
> Meanwhile the wikipedia page on the tsort _command_ is extra-useless:
>
>   https://en.wikipedia.org/wiki/Tsort
>
> The intro part says:
>
>   The tsort program is a command line utility on Unix and Unix-like platforms,
>   that performs a topological sort on its input. As of 2017, it is part of the
>   POSIX.1 standard.[1]
>
> Which literally tells me "tsort performs a topological sort" and NOTHING ELSE.
> The 'history" and "syntax" sections are equally content-free, and then in
> behavior it says "The output is a total ordering that corresponds to the given
> partial ordering" and then starts talking about listing graph vertices again.
>
> Hitting the ejector seat on the mathematicians: in plumbing-speak tsort is a
> dependency resolver taking before/after pairs and giving you a list where each
> "before" occurs earlier than the corresponding "after", with everything
> mentioned exactly once. So it can tell you what order to run a pile of init
> scripts in, what order to insmod modules in, what order to apt install packages
> in... very useful.
>
> But NOBODY SAID THAT. I had to FIGURE IT OUT. The first roadmap.html (commit
> 8f90d3aa019f from 2012) said "Some commands are for a compiler toolchain (ar c99
> cflow ctags cxref gencat iconv lex m4 make nm strings strip tsort yacc), which
> is outside of toybox's mandate and should be supplied externally." and nobody
> ever corrected that assessment. Busybox didn't have this command until 2022.
>
> Elliott wasn't familiar with it either when he pointed me at
> https://austingroupbugs.net/view.php?id=1745 in September 5th because I'd
> recently mentioned tsort, and our resulting email thread was:
>
> On Tue, Sep 5, 2023 at 4:30 PM Rob Landley <rob at landley.net> wrote:
> > >
> > > On 9/5/23 14:36, enh wrote:
> >> > > since you were talking about tsort recently... (but, yeah, afaik this
> >> > > is only useful for ancient linkers. i don't think i've seen it used
> >> > > since the 1990s?)
> > >
> > > Then why did busybox add it last year?
> > >
> > >   https://git.busybox.net/busybox/commit/?id=4642cf5b388b
> > >
> > > That's the question I'm head scratching about...
> > >
> > > Rob
> >
> > because it's small, easy (modulo the unspecified bits), and in posix?
> >
> > did it not come up on the mailing list?
>
> And THAT was the intelligent question I _should_ have asked when pointed me at
> http://lists.busybox.net/pipermail/busybox/2022-February/089483.html saying "I
> was recently after tsort to do some init ordering on a small system. Hope this
> patch is acceptable." And I went "it does what now?" and _that_ started me down
> this rabbit hole.
>
> But it wasn't until I read the busybox mailing list post where somebody said
> they were using this for init dependency ordering that I had any idea what the
> command was FOR.
>
> And then digging to see which algorithm is best for implementing this dependency
> ordering takes me back into math-world where everybody's drawing graphs to
> explain stuff, and what I've GOT is an array of string pairs (which is neither a
> graph nor a tree). And trying to get much more detail about anybody's
> implementation means either reading GPLv2 busybox code or GPLv3 coreutils code,
> neither of which I really want to spend long staring into before writing
> something under a public domain equivalent license. Yes the wikipedia tsort page
> mentions knuth (and whoever A.B. Kahn is) under "Further Reading" at the bottom
> (not as links I could actually click on and read but as books behind paywalls).
>
> (I might still be just a little bit cheesed about this. Normal I vent the above
> sort of thing into the blog instead of here, but you asked. :)
>
> But sure, if you've got an easily explicable better algorithm I'm all ears.
>
> > but after I saw how far you'd got with your version
> > I decided not to for now. I haven't analyzed your version carefully.
>
> What I did seems to work, I understand how, and I expect to be able to fix it if
> it breaks.
>
> Also, what scalability do we _need_?
>
>   $ find /lib/modules/$(uname -r) -name '*.ko' | wc -l
>   3559
>
> The use cases I'm aware of for this thing aren't _that_ big. Even N^2
> scalability wouldn't be that bad for those, which I put extra effort in to try
> to _avoid_, the whole dance of the otop/otop2 end indexes wasn't necessary if I
> just looped out to len each time had I been willing to let it be N^2. But the
> exclusion list only NEEDS to look at the previous two passes, and the reason it
> needs to be _two_ passes is explained in the comment:
>
>         // duplicate killer checks through previous TWO passes, because
>         // "a b c a" removes "c a" pair after checking "a b" pair, so removing
>         // "a b" next pass tries to output "a" again.
>
> > It seems to
> > be O(N^2) due to all the work to close up the list with each item removed,
>
> Reading in the data in which is O(N) modulo the realloc() maybe getting
> pathological, qsort() is kinda N(log n) in the common case, we ititerate over
> the array doing a binary search on each entry which is N(log n) but how much
> each pass consumes is kinda potlock, and then we do more or less the opposite of
> an insertion sort pulling each entry we find back out which may be where you get
> your N^2 but I don't have a better idea, although memmove() that _should_ be
> pretty optimized on modern hardware it's almost certainly L2 cache local in any
> real world use case you're going to throw at it (we're just copying the
> contiguous POINTERS, not the string data randomly scattered in the heap that may
> be cache cold and not even in an active DRAM bank). The duplicate killing
> iteration part should only be N^2 for pathological inputs (just like qsort), and
> when it IS one of the other N's basically goes away because you're consuming
> LOTS of the list in one pass.
>
> There are various pathological orderings but each makes some of the other
> processing into NOPs for that pass. If you pull out exactly one entry per
> iteration you're doing extra binary searches but not doing a lot of memmove or
> dup checking. If you pull out EVERY entry on the first pass you do a lot of
> memmove (and I had an optimization in mind where it iterates down from the end
> instead of the beginning, so in that case each memmove is tiny, but I don't
> think I bothered)... but then the array gets way smaller in one pass. The
> expenses trade off.
>
> I did try to think about this. I just did so in the context of processing an
> array (which is the data representation I had), not a graph (which is a data
> representation I didn't particularly see an advantage in trying to convert it to).
>
> > on
> > average? Not sure, but it measures roughly O(N^2).
>
> How big an input have you fed it? Being N^2 over 10 elements is fine by me.
>
> As for proper stress testing, the biggest input source I can think of would
> probably be extracting the dependencies from debian's package repository:
>
>   $ aptitude search . | wc -l
>   74699
>   $ aptitude show zypper | sed -n 's/Depends://p'
>   libaugeas0 (>= 0.6.0), libc6 (>= 2.14), libgcc1 (>= 1:3.0),
>   libreadline7 (>= 6.0), libstdc++6 (>= 5.2), libxml2 (>= 2.7.4), libzypp1702,
>   zypper-common (>= 1.14.11-1), python
>
>
> And feeding all the package names in as pairs. Modulo there's some processing to
> strip out the version info and there are meta-packages and so on? I think using
> aptitude on the distro's visible repository for the current install should
> filter out architecturally unavailable packages, but it might still need
> nontrivial processing. And the result is almost certainly GOING to have cycles
> in it down in the base packages debootstrap installs. (Or else those packages
> lie about their dependencies, or got munged together into a meta-package.)
>
> Still, there's a dataset of 75k strings with dependency info, could be anywhere
> up to a million A->B pairs. Probably worth a dig to see how tsort handles it.
>
> > I looked at the code and also saw in your blog about the problem of creating a
> > pointer to an element "below" the keep[] array.
>
> It's not HARD, I just consider the compiler's behavior to be obviously wrong
> here. My code never performs an illegal dereference. I don't see why the
> compiler should care what pointer values I'm passing around when they're not
> being dereferenced: in C, tracking that is MY job. Their warning generator cares
> about things it shouldn't, and there's no override the way you can put
> parentheses around an assignment to tell if() or for(;;) that yes, you really
> meant to do that.

(if you're going to keep getting worked up about stuff like this, you
probably need to join WG14 and argue about it with them. otherwise
only the compiler writers have a voice, and they always vote for
whatever lets them implement dodgy optimizations.)

> What I've generally done to rip the code out of a context where the warning
> generator THINKS it understands what's going on is to typecast to long so it can
> do the exact same math without attaching strange C++ religious taboos to it.
> (Your virtual functions will stop working, you'll select the wrong template
> expansion, and the exception handler's stack unwinding may misbehave! Unclean!
> Unclean! I haven't got those. You can stop now.)
>
> Sigh, we were just talking about what is and isn't an atomic access in devmem.
> (Note that if you're doing unaligned access, it's not atomic. No I haven't added
> a check for that yet.) If the C compiler is doing significantly different memory
> accesses than you told it to, STUFF WILL BREAK. That's what C is for.
>
> > Instead of doing the "usual LP64
> > trick", had you considered using another array?
>
> Yes, I considered declaring "char *sadness[3], **keep = sadness+1" but the
> reason I _didn't_ is the compiler is WRONG.
>
> > Something like this:
> >
> > --- "..\\1005\\toys\\posix\\tsort.c" 2023-10-05 23:28:19.000000000 -0600
> > +++ "toys\\posix\\tsort.c" 2023-10-06 10:14:01.448795400 -0600
> > @@ -43,23 +43,13 @@
> >  static void do_tsort(int fd, char *name)
> >  {
> >    off_t plen = 0;
> > -  char *ss, **pair, *keep[2];
> > +  char *ss, **pair, *keep[2], *keepfake[2];
> >    long count,    // remaining unprocessed pairs
> >         len,      // total strings in pair list
> >         out,      // most recently added output (counts down from len)
> >         otop,     // out at start of this loop over pair[]
> >         otop2,    // out at start of previous loop over pair[]
> >         ii, jj, kk;
> > -  unsigned long sigh;
> > -
> > -  // bsearch()'s first argument is the element to search for,
> > -  // and sbse() compares the second element of each pair, so to find
> > -  // which FIRST element matches a second element, we need to feed bsearch
> > -  // keep-1 but the compiler clutches its pearls about this even when
> > -  // typecast to (void *), so do the usual LP64 trick to MAKE IT SHUT UP.
> > -  // (The search function adds 1 to each argument so we never access
> > -  // memory outside the pair.)
> > -  sigh = ((unsigned long)keep)-sizeof(*keep);
> >
> >    // Count input entries in data block read from fd
> >    if (!(ss = readfd(fd, 0, &plen))) return;
> > @@ -92,8 +82,12 @@
> >      for (ii = 0; ii<count; ii++) {
> >        // Find pair that depends on itself or no other pair depends on first str
> >        memcpy(keep, pair+2*ii, sizeof(keep));
> > -      // The compiler won't shut up about keep-1 no matter how we typecast it.
> > -      if (keep[0]!=keep[1] && bsearch((void *)sigh, pair, count, sizeof(keep),
> > +      // bsearch()'s first argument is the element to search for,
> > +      // and sbse() compares the second element of each pair, so to find
> > +      // which FIRST element matches a second element, we need to feed bsearch
> > +      // an array with a second element == keep[0].
> > +      keepfake[1] = keep[0];
> > +      if (keep[0]!=keep[1] && bsearch(keepfake, pair, count, sizeof(keep),
> >            (void *)sbse)) continue;
> >
> >        // Remove from pair list
> >
> > I think it's clearer,
>
> My workaround changes 2 lines (and probably a 1 line comment like "sadness[0] is
> never accessed, but passing keep-1 from keep[2] spuriously warns"), I was just
> mad at the compiler not letting me (void *) or similar my way around the
> warning. There should be a way to tell it "I believe I know what I'm doing here,
> stop trying to drive". There is not, that I could find. I don't want to make
> that lack LESS obvious in case such a way comes up in future.
>
> If Elliott really really wants the blah[3], keep=blah+1 version I can do that.
> But the compiler would still be wrong, and I don't volunteer danegeld. (The
> compiler keeps threatening me with a good time. "Warning: yes, I want that to
> happen.")

given that (as you quoted above) i haven't seen anyone use tsort since
i last had to suffer AIX's linker ("the worst linker ever written") in
the 1990s, i haven't enabled tsort and have no plans to do so.

> > I believe it's a little shorter in code size, and the run
> > time difference is negligible.
> >
> > I can post here about how I tested, if you're interested.
>
> How you tested what? That it's shorter? Last I checked make baseline/bloatcheck
> still work...
>
> (The one I added to busybox in 2006, commits f8a808426745 and d9872aa0d702, was
> an external project written in python. The toybox one from commit 7a4551ff5eec
> is a simple bash wrapper around nm --size-sort with diff and sort in there...
> Wow, it's been a while since I wrote that file isn't it? Ok, couple quick tweaks...)
>
> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net


More information about the Toybox mailing list