[Toybox] diff.c

Rob Landley rob at landley.net
Fri Aug 26 02:49:19 PDT 2022


On 8/25/22 09:43, enh wrote:
>     *shrug* I've been tracking these things down for years without this tool, it's
>     not a blocker. But the callstack would save time bisecting the code with printfs
>     to figure out where it went off the rails...
> 
> yeah, although it's not nearly as cool without the stacks, just knowing you have
> a problem is step 1.

It's a hang without ASAN. Not exactly subtle. :)

The immediate problem is that my dump_hunks() is getting lines out of sync and
falling off the end of one of the arrays. The more INTERESTING problem is that
debian's diff says the failing hunk is:

--- dif1.c	2022-08-26 00:54:19.827964685 -0500
+++ dif2.c	2022-08-26 00:54:41.231964278 -0500
@@ -8,30 +8,15 @@
 		return strcmp(ln1->linedata, ln2->linedata) == MATCH;
 }

-BOOL match(LINE *oldp, LINE *newp)
+bool match(LINE *oldp, LINE *newp)
 {
 	int i;
 	for ( i=0; i < minmatch; i++, oldp = oldp->next, newp = newp->next )
 		if ( !eq(oldp, newp) )
-			return FALSE;
-	return TRUE;
+			return false;
+	return true;
 }

-#if 00
-void putqln(LINE *pln, DFILE *file)
-{
-	if ( ! pln->lneof ) {
-		if ( in_context_sw )
-			if ( file == oldf )
-				printf("<<<  ");
-			else
-				printf("| ");
-		printf("%s\n", pln->linedata);
-	}
-	freeln(pln);
-}
-#endif
-
 void putqln(LINE *pln, DFILE *file)
 {
 	if ( ! pln->lneof ) {

Which if you'll notice repeats the last three lines: they're removed right after
the #if and also occur after the #endif as the last three lines. And my
simple/greedy algorithm is trying to call the first three _matches_ and then
have the rest of the file be one big subtraction, which means it's not nicely
bracketd with matching intro/exit lines. (The find_hunk() logic ensures such a
bracketing, but the dump_hunk() logic's simplistic decision on how to display it
does not.)

Also, debian is saying -8,30 +8,15 and mine's saying -8,29 +8,14 which I'm still
trying to track down....

On the whole, good test case. :)

>     I made it as far as 'both of those have value fa, meaning "Heap left redzone"'
>     and stopped because I have other things to do. This goes on the todo heap with
>     valgrind and making better use of gdb and so on.
> 
> yeah, like i say --- i've been a heavy asan/hwasan user for years but i don't
> think i've _once_ used the shadow map. as far as i'm concerned it's just "how it
> works", so none of my business. (though when i talked to them about the error
> wording [for which there are now llvm patches up], they said they should fix the
> addresses in the dump to be the actual heap addresses, not the shadow addresses.)

People have done things like "electric fence" for decades, usually with a
horrible performance penalty. After QEMU and xen/kvm got popular intel and arm
got into a race to improve their mmu capabilities and people started trying to
apply that to the memory access pattern validation problems (multiple lwn.net
articles about that a decade or so back) with the dream of making it cheap
enough to leave on at deployment, and it's nice to see that stuff finally bear
fruit. But it's not exactly new. :)

I first wrote my own heap walker to periodically validate its integrity back
under OS/2. (The codebase I inherited had _five_ alloc/free contexts in play all
at once and every once in a while the OS/2 equivalent of MMAP_ANONYMOUS would
get passed to SOM_free() and it would quietly swallow it and continue for 5
seconds or so and then an unrelated thread would explode. Yes that was in a
heavily threaded environment. The OS/2 desktop ("workplace shell") instantiated
new objects by loading shared libraries into a giant shared process space. (And
I think firing up a new thread to run its constructor function? It's been a
while. I worked on their new package management system, "Feature install", which
was a subclass of the "folder" object in the workplace shell which was built on
top of IBM's System Object Model (one of the first implementations of the Common
Object Request Broker Architecture which was just horrific) which had metaclass
instance objects acting as factories (java did NOT have proper metaclasses, at
least not for many years), but ultimately it all got its memory from the heap
maintained by the C library which got memory from the OS. Of COURSE I
independently invented page poisoning without knowing what it was called. I did
the same for "linked lists" as a teenager. Heck, in college I reinvented
bytecode and was all excited about it until I got introduced to java a few years
later.)

This is one of the reasons computer history interests me. The new people
reinveting the wheel for the 50th time mistake the ruts from heavily trodden
ground for geology. You want to find the REALLY fun ideas, ask why Grace Hopper
did what she did when inventing shared libraries. (She talked about it in the
HOPL keynote talk, which is in a book in the UT library I photocopied a bunch of
pages out of, but not available online that I know of? In theory the talk is on
video, in PRACTICE I went to the library that claimed to have it and they didn't
want to dig the old VHS tapes out of the back room because they were too fragile
or something... Ah, looks like it might be available online?
https://dl.acm.org/doi/10.1145/800025.1198341 )

Anyway, back to poking at diff...

Rob


More information about the Toybox mailing list