[Toybox] tsort regrets (really: hashing)
Rob Landley
rob at landley.net
Tue Nov 7 11:24:42 PST 2023
On 11/6/23 09:39, enh wrote:
>> P.S. Yes I checked if posix/libc just had one I could use, like with qsort(),
>> but man 3 hash says "This page documents interfaces provided in glibc up until
>> version 2.1. Since version 2.2, glibc no longer provides these interfaces.
>
> musl has the POSIX <search.h>, so i assume glibc still does too?
That API has no walk() function to return all entries in the table,
hsearch(ENTER) returns null if the table is "full" (no resize so I'd have to do
my own gc without access to a walk() and only the non-posix _r extensions let
you have old and new tables in play at once so you pretty much have to keep a
linked linked list _and_ the table), can't pass a cleanup function to
destroy_r() to free the entries, for the tar.c use case I'd have to xmprintf()
the dev/ino pairs to produce the string "key" needed to store them in the table
when some xor and shifts seem like the obvious hash function...
It's a nice option to have, but didn't fit the use cases I'd compared it with.
But I should probably read Rich's implementation of it because he has a history
of doing "the sane thing"... his string hash is while (*p) h = 31*h + *p++;
which looks fine. His stride is for (i = hash, j=1;; i += j++) which checks
hash&size, (hash+1)&size, (hash+1+2)&size, (hash+1+2+3)&size thus retaining
reasonable L1 cache locality without chaining into someone else's overflow (very
nice). Yeah, this looks like it makes sense. Still no delete or walk. (Easy way
to delete uses whiteouts, I tend to (void *)1 which _really_ seems to annoy the
C++ compiler developers...)
Huh, Rich _does_ resize() under the covers in __hsearch_r(). Pity I can't rely
on that happening in an arbitrary libc implementation...
*shrug* I _can_ do this but was trying to hold the optimization off until we'd
accumulated test loads inconvenienced by poor performance, so we can confirm the
changes actually helped real world use. (Optimization in search of a load is a
variant of infrastructure in search of a user...)
Lemme finish redoing crypt() first...
Rob
More information about the Toybox
mailing list