[Toybox] awk seen in the wild

Rob Landley rob at landley.net
Mon Jul 11 12:45:58 PDT 2016


On 07/11/2016 11:46 AM, enh wrote:
> On Sun, Jul 10, 2016 at 10:28 AM, Rob Landley <rob at landley.net> wrote:
> another plug for supporting a libcrypto dependency: the *sum utilities
> are orders of magnitude faster with libcrypto, the SSL support in
> things like netcat/wget would be something i could actually use on
> Android (there's no way i'll be able to ship an alternative SSL
> implementation) and you'd get arbitrary precision integers too.

What I want to do is take the approach Isaac Dunham suggested, of using
"openssl s_client -quiet -connect" as an alternative to netcat. So
toybox wget should call out to that to get https support, and that would
be provided by something external.

Lipi Lee did a first pass at this already, which I didn't immediately
apply because for some reason the patch he sent me didn't apply to the
wget he sent me (I don't think I'd modified it). When I tested the wget
it was corrupting the files it downloaded (outputting numbers in the
middle of the data), and that sent me down the road of rewriting the
thing...

I'm balancing some competing design goals here: 'self-contained' vs
'people use this and need speed out of some tightly optimized
algorithms'. The way busybox dealt with this was by having multiple
implementations (CONFIG_MD5_SMALL has 4 settings), which I very much
don't want to do...

The problem with having an internal an external implementation is the
internal one gets much less testing that way. I suspect the right answer
is to just lump it and have the actual unrolled fast version in toybox,
because the simple one isn't good enough for the userbase. That said, a
lot of these external libraries have assembly optimized versions for
various platforms, and I KNOW I'm not going there...

Hmmm. Which lib is "libcrypto", by the way?

> i don't think you're likely to go this route, but i do like to keep
> bringing it up so the idea of being API-compatible enough that it's
> possible to use toybox with either your backend or *ssl is in the back
> of your mind...
> 
> (no one's complained about the slow *sum commands yet, but if you're
> interested i'm happy to send a patch.)

People have sent patches to speed up md5sum and sha1sum and it boils
down to lots of loop unrolling that makes the algorithm harder to
understand. It was back around here:
http://lists.landley.net/pipermail/toybox-landley.net/2014-May/006638.html

I applied the first few, but the code got very large and very unreadable
and I kept hoping there was a way the compiler's darn optimizer could do
that for me. I should go back and look at those patches again, but it's
competing with 60 other todo items...

According to http://valerieaurora.org/hash.html both md5sum and sha1sum
are semi-obsolete. I need to do sha256 and sha3 and so on, and adding
those is a higher priority todo item for me than speeding up the old
ones. But despite being obsolete as _cryopto_, rsync moved from md4sum
to md5sum a couple years back, and git is based on sha1, and neither is
moving off those because you wrap the transport in https and sign the
commits if you care about security...

My deflate implementation is also half the speed it should be, although
the first pass was focusing on correctness rather than any kind of
optimization. (In theory zlib started life as a public domain
implementation, but that version seems to have fallen off the net and
doesn't have a lot of modern optimizations anyway.)

One thing I've been meaning to do with deflate/inflate is add the SMP
mode (where it outputs zero byte packets at each dictionary reset so you
can scan ahead for them and do blocks in parallel).

So many todo items...

Rob


More information about the Toybox mailing list