[Toybox] md5sum cleanup
Ivo van Poorten
ivopvp at gmail.com
Sat Jun 7 04:11:10 PDT 2014
On Sun, 01 Jun 2014 12:00:08 -0500 Rob Landley <rob at landley.net> wrote:
> On 05/15/14 11:50, Ivo van Poorten wrote:
> > On Wed, 14 May 2014 21:56:13 -0700 Daniel Verkamp <daniel at drv.nu>
> > wrote:
> >> Here's a quick cleanup of md5sum. Executive summary: smaller and
> >> faster.
> >
> > Nice! It got me excited to see if I could get it faster.
>
> Bigger and faster.
>
> Sorry for the delay evaluating this, but A) it rolls up the previous
> patch I already applied so I have to separate it out to evaluate it,
> B)
> #define COMMON isn't going in.
>
> I _really_ dislike having a macro like that repeat bits of code. I see
> why you did it, but ew. I'm also confused why the do { XXX } while
> (0); wrapper is needed when the only users aren't in if/else blocks?
> The for has their own curly brackets (for code that's all on the same
> line...)
Yeah, the do{}while(0) wrapper is a left-over from previous
experiments. I always want to be safe instead of sorry, but at this
point they can be removed.
I could replace the COMMON macro by a static inline function or just
duplicate the code, although do not really like the latter.
> I see what you're doing here, and now that it's been pointed out how
> much slower it is than other implementations I agree speeding it up is
> good. (That's why I applied the first patch, even though that table
> repeating each stanza 4 times is just painful. But adding math to the
> dereference to repeat table sections added enough cycles to the inner
> loop to slow it back down to about where it was before...)
> > Times in seconds on my machine (IA32 Sempron 1.66MHz):
>
> The sempron is AMD, and it's 64 bit. Saying "IA32" for a sempron is
> wrong in two distinct ways.
My Sempron is a Socket-A Sempron 2400+ running at 1.66MHz from late 2004
and is in fact 32 bit. Linux wrongly detects it as an Athlon MP 1500+
though, but Socket-A is 32-bit only. Basically they renamed their
budget line (Duron) to Sempron, just before switching to 64-bit.
> > cur-hg 8.6
> > daniel 6.7
> > ivo 4.9
> > md5sum 2.8
> > openssl 2.7
>
> The "md5sum" there is your host's existing command line version?
Yes, from GNU coreutils.
> > The tables can be downsized to 64 bytes, but it'll make it a bit
> > slower, especially on architectures where non-aligned reads are
> > slower.
>
> Um, example?
I guess it's not really relevant anymore. I believe Sparc's in the
90's had problems with single-byte reads, but I'm not sure. Better
change the table to 64 bytes until somebody complains.
> Could you get me a patch on top of what's currently there?
Sure. I'll look into it.
Perhaps we could introduce a global config flag like the FFmpeg project
has (CONFIG_SMALL). Not tons of tiny tweaks all over the place, but
choose either size or speed?
I think the fastest implementation of md5sum without inline assembly is
to completely unroll the loop and inline the tables as constants. It'll
be a lot bigger though.
Regards,
Ivo
1402139470.0
More information about the Toybox
mailing list