[Toybox] diff.c

enh enh at google.com
Fri Aug 26 16:11:33 PDT 2022


On Fri, Aug 26, 2022 at 3:04 PM Ray Gardner <raygard at gmail.com> wrote:

> On Wed, Aug 24, 2022 at 2:26 AM Rob Landley <rob at landley.net> wrote:
> >
> > On 8/22/22 10:39, enh via Toybox wrote:
> > > On Sun, Aug 21, 2022 at 9:12 AM Ray Gardner <raygard at gmail.com
> > > <mailto:raygard at gmail.com>> wrote:
> > >
> > >     I guess I'll skip writing up an explanation of these algos then; I
> > >     thought you were looking for a less "mathematical" explanation
> ("why
> > >     can nobody explain what the old stuff is doing without pretending
> it's
> > >     math instead of an algorithm?" "Doug McIlroy's old diff paper from
> > >     1976 is still written in math-ese. It SEEMS to be describing very
> > >     simple concepts but it's trying to explain them as if this is
> > >     calculus, which it is not." etc.) But that's moot if you're going
> with
> > >     what appears to be the old "look ahead a little for matching lines
> to
> > >     re-sync on" heuristic diff method.
> > >
> > > (fwiw, i think such a document would be a useful thing to leave lying
> around on
> > > the internet ... someone will want it sooner or later :-) )
> >
> > I'm also interested. I've just been a bit distracted because my air
> conditioner
> > broke over the weekend (soonest somebody can come fix it is thursday.
> It's
> > August in Texas...) and because my diff code isn't quite ready to go
> back and
> > run this new test through yet, and I wanted to include that in my reply.
>
> Thanks for the encouragement, guys. Check it out. I've posted it at
> https://www.raygard.net/2022/08/26/diff-LCS-Hunt-Szymanski-Kuo-Cross/
> along with the code. C code is
> https://github.com/raygard/lcs_diff_demo/blob/main/src/c/diff.c ;
> there is also Python code in that repo, including a version of
> Hunt-McIlroy that may satisfy Rob's complaint "Alas, I've read that
> paper through twice and have not managed to turn their explanation
> into pseudo-code, and that's WITH an implementation in front of me",
> if you find that Python is acceptable as executable pseudocode, as
> I've heard it said. The code is written very closely to the paper's
> appendix. But, if you read the post linked above, you'll see that it's
> probably better to use Kuo/Cross (or my mod of that) instead.
>
> I did find a bug in the diff.c I posted earlier and I think this
> version gets everything right. I expanded it a bit to include
> Hunt/Szymanski but that's just for demo, it's not as good as the
> Kuo/Cross implementations also included. (Compile it with HS, KC, or
> KCMOD defined to get the different algos.)
>
> I do this for my own enjoyment but also hoping someone will benefit
> from it. I wrote the earlier C code I posted to the list in the hope
> that it would help Rob. He's decided to go his own way, so that was
> probably many wasted hours. But if you want to get, e.g. a unified
> diff with -U 0 (no context lines), I have it. I think that may be
> harder with Rob's approach. I will also bet that my diff.c will be
> faster on large-ish files and produce better diffs but that remains to
> be tested. And I've also back-tested it by patch-ing the files with
> the diff output, including context lines of 0, 1, 2, 3, 4.
>
> So, nobody reads my blog ("glob") at least not yet. I spent many many
> hours developing the world's fastest qsort() implementations, but I
> have no way to get anyone who can get them used to look at it.
>

have you tried? the llvm-libc folks are effectively starting from scratch (
https://github.com/llvm/llvm-project/blob/main/libc/src/stdlib/qsort.cpp),
and i'd imagine the BSDs would be receptive to improvements, even if it's
super annoying that there are three BSDs that don't really share code in
any organized fashion? (we've never found qsort() important enough to do
anything but track BSD; i forget which one!)


> Now I've written what I hope is a good explanation of a couple of LCS
> algos. I sure hope you'll read it and give me feedback. If you don't,
> I may never speak to you again :-).
>
> Ray
>
> Ray
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20220826/0ed7fa00/attachment.htm>


More information about the Toybox mailing list