[Toybox] diff algorithms

enh enh at google.com
Fri Aug 13 12:19:09 PDT 2021


ack. i'll send it to you directly, because i was on too many mailing lists
in the 1990s to feel comfortable sending an 11MiB attachment to a list :-)

(the instructions to generate these files yourself by running the
https://github.com/gavinhoward/bc tests would be far shorter, but it would
take too long for me to reverse-engineer how i got into that mess in the
first place, so...)

On Fri, Aug 13, 2021 at 3:12 AM Rob Landley <rob at landley.net> wrote:

> On 8/12/21 4:27 PM, enh via Toybox wrote:
> > you know how you (rob) have repeatedly expressed your desire to have a
> different
> > diff implementation, and i've always either ignored you or claimed that
> the
> > existing one is good enough?
> >
> > well ... i finally hit a case where i can tell the difference. it turns
> out that
> > if you have 3 million lines in the files you're diffing, GNU diff can get
> > through that in less than 10s, busybox takes just under an hour (!), and
> toybox
> > takes just over an hour.
> >
> > i'm assuming you already knew of cases like this, but i'll keep my two
> 125MiB
> > files somewhere just in case. they compress pretty well, being _very_
> repetitive
> > ASCII, but the zip file is still 11MiB so i won't post it without being
> asked.
>
> I'd love to get a copy of those just for personal development testing if I
> can.
> (Running problematic real world data through the thing is always
> preferable.)
>
> I'll add a TODO for figuring out how to reasonably have the test suite
> address
> the issue without checking in an 11 megabyte test file. :)
>
> Thanks,
>
> Rob
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20210813/c3448dc9/attachment.htm>


More information about the Toybox mailing list