[Toybox] xmemcmp()

Wed Jan 4 10:53:55 PST 2023

On 1/3/23 20:00, enh wrote:
> On Mon, Jan 2, 2023 at 11:20 AM Rob Landley <rob at landley.net
> <mailto:rob at landley.net>> wrote:

I'm in text mode, thunderbird. Text mode. You should not be doing in-band regex
matching to expand html tags in TEXT MODE.

I had a long email thread with a journalist last month about how the apache
foundation is a proper open source umbrella organization, but the mozilla
organization is the residue of a money-centered silicon valley startup (netscape
dot-com) that has nonprofit status the same way megachurches do.

Not a unique obeservation, of course:
https://mastodon.ar.al/@aral/109551499671511513

>           (*expect)->prev->data = "do\0C";
>           dlist_add(expect, (*s == 'c') ? "esac" : "do\0A");
> 
>         end = 0;
>         if (!strcmp(s, "if")) end = "then";
>         else if (!strcmp(s, "while") || !strcmp(s, "until")) end = "do\0B";
>         else if (!strcmp(s, "{")) end = "}";
>         else if (!strcmp(s, "(")) end = ")";
>         else if (!strcmp(s, "[[")) end = "]]";
> 
>             dlist_add(expect, end);
> 
>     And so on. Any string constant starting with "do" will have a fourth byte, and
>     any other string constant is going to diverge within the bounds of the actual
>     string constant so we're _going_ to hit inequality before falling off the end.
> 
> isn't your function _strncmp()_ rather than memcmp() though?

No, I care about the A/B/C in position 4 matching too. That's how I distinguish
the three "do" cases. A strncmp() will stop at the NUL and go "yup, matched". I
want it to check the fourth byte IF the first three bytes matched, which is
memcmp(). Return the sign of the difference for the first nonmatching byte pair
within range, zero if the whole range matched.

> even though one _could_ write a byte-by-byte memcmp(), the standard does not
> require that, and i'm aware of no non-C implementation that works that way.

A non-C implementation of a C library function?

> (musl may have misled you here? strictly BSD also has a memcmp.c that's
> byte-by-byte, but all the real architectures have assembler versions they use
> instead.)

I made one that works the way I expect and switched all the calls to it. (If I
need to repeatedly memcmp() something I expect to remain equal for longer than a
few cache lines, I question my algorithm. And a byte-by-byte tight loop in L1
cache with multiple execution units and branch predicted pipeline reordering
isn't gonna be _that_ slow.)

> my confusion with the xmemcmp() name is (a) that this isn't really a memcmp()
> because you want to guarantee that you don't read past a mismatch [because it
> might be a NUL because you're actually dealing with strings], and (b) that
> everywhere (?) else in toybox the leading `x` means "or exit". which this
> doesn't do.

Sigh: naming things, cache invalidation, and off by one errors.

I agree that xmemcmp() is not the ideal name. The x prefix means "exits", and
this doesn't.

memscmp() maybe? (memstrcmp?)

>     Leaving aside "I already fixed it" and instances like
>     https://github.com/landley/toybox/commit/472599b99bec
>     <https://github.com/landley/toybox/commit/472599b99bec> where I may grumble
>     but I
>     do actually check in even completely useless changes to mollify the sanitizer,
>     in this case the chance of an actual bug is nonzero. (Merely astronomically
>     small.)
> 
>     Bionic does seem to do the "optimized" version where it's doing word sized
>     reads. Well, MAYBE it isn't here because
>     libc/arch-arm/generic/bionic/memcmp.S has:
> 
>             /* make sure we have at least 8+4 bytes, this simplify things below
>              * and avoid some overhead for small blocks
>              */
>             cmp        r2, #(8+4)
>             bmi        10f
> 
> yeah, all our many memcmps will do the largest reads they can. so arm64 will
> start with 16 bytes/load but get down to 1 if it needs to. but that 4 would mean
> it wouldn't do less than a 4-byte load. (you'd have gotten the behavior you were
> hoping for if your constant had been 3 though!)

There are many circumstances under which I would not have hit this, true.

"Optimization broke the obvious semantics" is not exactly a _new_ story, nor is
my reaction to it. (30 years later can still be "premature".)

[Rant cut and pasted to the blog.]

>     > Not sure if turning off
>     > strict_memcmp will shut up the error you encountered,
> 
>     But that's not their test.
> 
> indeed. not least because we use hwasan :-)

And even if you didn't yet, you could start in future...

>     > but it does for the case
>     > above. You can also compile the options into the source (see
>     > https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
>     <https://github.com/google/sanitizers/wiki/AddressSanitizerFlags>
>     >
>     > const char *__asan_default_options() {
>     >   return "intercept_memcmp=1:strict_memcmp=0";
>     > }
> 
>     I posted here christmas eve about how I had to do that to shut off the memory
>     leak detector, because gcc was ignoring the environment variable.
> 
>     > ASAN may be overly paranoid with strict_memcmp, but I think it's looking (with
>     > strict) to see if any of the two buffers fall in unallocated (or even just
>     > uninitialized? how would it know?) memory.
> 
>     It cares that we go off the end of a string constant. At a guess it's putting a
>     guard byte between each one and marking that byte poisoned in the funky shadow
>     map thing?
> 
> aiui, yes, asan goes out of its way to make the "highly improbable" easily
> reproducible. (because, yes, when the _visible_ part of your ecosystem is 3bn
> users [and you can't even see the rest of the iceberg], unlikely shit happens
> daily. best to catch it before it ships!)

Agreed. In this case, it's not number-of-users it's number-of-build
permutations, but eh... that just makes it easier to hide?

A problem that "now I've seen it, fixing it later would only cost ME 5 minutes
if it did eventually happen" doesn't mean some poor developer in vietnam hitting
this because they added a printf() to a DIFFERENT COMMAND wouldn't lose a week,
and I could always get hit by a bus, and there's "long tail" support far enough
in the future that domain expertise gets a bit wobbly, and...

If it can happen, expect it. Murphy was an optimist, just not evenly distributed...
>     Note that it triggered on gcc with glibc, and I haven't looked at glibc's memcmp
>     source because gnu and gplv3 and ew. But also because it doesn't matter because
>     I dunno what implementation they'll switch to in future, and in THEORY doing
>     readahead is always a possibility (Bionic's neon version claims to be abusing
>     the FPU to do 32 _byte_ comparisons), and in THEORY you could fall off the edge
>     of a segment doing that.
> 
> for arm64, the SVE memcmp() will load as many bytes as your vector size :-)

Which is not optimizing for the common case, but ok...

> interestingly, there seem to be new instructions for non-faulting loads,
> specifically so you can do this kind of
> thing: https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/LDNF1B--Contiguous-load-non-fault-unsigned-bytes-to-vector--immediate-index-- <https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/LDNF1B--Contiguous-load-non-fault-unsigned-bytes-to-vector--immediate-index-->

Further increasing complexity to mitigate the fallout from a previous
unnecessary optimization is not my preferred approach, I tend to rip OUT stuff
with sharp edges and little to no benefit. But to each their own...

Rob