[Toybox] xmemcmp()

Thu Jan 5 17:10:17 PST 2023

On Wed, Jan 4, 2023 at 10:42 AM Rob Landley <rob at landley.net> wrote:

> On 1/3/23 20:00, enh wrote:
> > On Mon, Jan 2, 2023 at 11:20 AM Rob Landley <rob at landley.net
> > <mailto:rob at landley.net>> wrote:
>
> I'm in text mode, thunderbird. Text mode. You should not be doing in-band
> regex
> matching to expand html tags in TEXT MODE.
>
> I had a long email thread with a journalist last month about how the apache
> foundation is a proper open source umbrella organization, but the mozilla
> organization is the residue of a money-centered silicon valley startup
> (netscape
> dot-com) that has nonprofit status the same way megachurches do.
>
> Not a unique obeservation, of course:
> https://mastodon.ar.al/@aral/109551499671511513
>
> >           (*expect)->prev->data = "do\0C";
> >           dlist_add(expect, (*s == 'c') ? "esac" : "do\0A");
> >
> >         end = 0;
> >         if (!strcmp(s, "if")) end = "then";
> >         else if (!strcmp(s, "while") || !strcmp(s, "until")) end =
> "do\0B";
> >         else if (!strcmp(s, "{")) end = "}";
> >         else if (!strcmp(s, "(")) end = ")";
> >         else if (!strcmp(s, "[[")) end = "]]";
> >
> >             dlist_add(expect, end);
> >
> >     And so on. Any string constant starting with "do" will have a fourth
> byte, and
> >     any other string constant is going to diverge within the bounds of
> the actual
> >     string constant so we're _going_ to hit inequality before falling
> off the end.
> >
> > isn't your function _strncmp()_ rather than memcmp() though?
>
> No, I care about the A/B/C in position 4 matching too. That's how I
> distinguish
> the three "do" cases. A strncmp() will stop at the NUL and go "yup,
> matched". I
> want it to check the fourth byte IF the first three bytes matched, which is
> memcmp(). Return the sign of the difference for the first nonmatching byte
> pair
> within range, zero if the whole range matched.
>
> > even though one _could_ write a byte-by-byte memcmp(), the standard does
> not
> > require that, and i'm aware of no non-C implementation that works that
> way.
>
> A non-C implementation of a C library function?
>

well, "assembler" if you must. my distinction being "regardless of
architecture" (so not specifically arm64 or whatever).

> > (musl may have misled you here? strictly BSD also has a memcmp.c that's
> > byte-by-byte, but all the real architectures have assembler versions
> they use
> > instead.)
>
> I made one that works the way I expect and switched all the calls to it.
> (If I
> need to repeatedly memcmp() something I expect to remain equal for longer
> than a
> few cache lines, I question my algorithm. And a byte-by-byte tight loop in
> L1
> cache with multiple execution units and branch predicted pipeline
> reordering
> isn't gonna be _that_ slow.)
>
> > my confusion with the xmemcmp() name is (a) that this isn't really a
> memcmp()
> > because you want to guarantee that you don't read past a mismatch
> [because it
> > might be a NUL because you're actually dealing with strings], and (b)
> that
> > everywhere (?) else in toybox the leading `x` means "or exit". which this
> > doesn't do.
>
> Sigh: naming things, cache invalidation, and off by one errors.
>
> I agree that xmemcmp() is not the ideal name. The x prefix means "exits",
> and
> this doesn't.
>
> memscmp() maybe? (memstrcmp?)
>

safememcmp()?

> >     Leaving aside "I already fixed it" and instances like
> >     https://github.com/landley/toybox/commit/472599b99bec
> >     <https://github.com/landley/toybox/commit/472599b99bec> where I may
> grumble
> >     but I
> >     do actually check in even completely useless changes to mollify the
> sanitizer,
> >     in this case the chance of an actual bug is nonzero. (Merely
> astronomically
> >     small.)
> >
> >     Bionic does seem to do the "optimized" version where it's doing word
> sized
> >     reads. Well, MAYBE it isn't here because
> >     libc/arch-arm/generic/bionic/memcmp.S has:
> >
> >             /* make sure we have at least 8+4 bytes, this simplify
> things below
> >              * and avoid some overhead for small blocks
> >              */
> >             cmp        r2, #(8+4)
> >             bmi        10f
> >
> > yeah, all our many memcmps will do the largest reads they can. so arm64
> will
> > start with 16 bytes/load but get down to 1 if it needs to. but that 4
> would mean
> > it wouldn't do less than a 4-byte load. (you'd have gotten the behavior
> you were
> > hoping for if your constant had been 3 though!)
>
> There are many circumstances under which I would not have hit this, true.
>
> "Optimization broke the obvious semantics" is not exactly a _new_ story,
> nor is
> my reaction to it. (30 years later can still be "premature".)
>
> [Rant cut and pasted to the blog.]
>
> >     > Not sure if turning off
> >     > strict_memcmp will shut up the error you encountered,
> >
> >     But that's not their test.
> >
> > indeed. not least because we use hwasan :-)
>
> And even if you didn't yet, you could start in future...
>
> >     > but it does for the case
> >     > above. You can also compile the options into the source (see
> >     > https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
> >     <https://github.com/google/sanitizers/wiki/AddressSanitizerFlags>
> >     >
> >     > const char *__asan_default_options() {
> >     >   return "intercept_memcmp=1:strict_memcmp=0";
> >     > }
> >
> >     I posted here christmas eve about how I had to do that to shut off
> the memory
> >     leak detector, because gcc was ignoring the environment variable.
> >
> >     > ASAN may be overly paranoid with strict_memcmp, but I think it's
> looking (with
> >     > strict) to see if any of the two buffers fall in unallocated (or
> even just
> >     > uninitialized? how would it know?) memory.
> >
> >     It cares that we go off the end of a string constant. At a guess
> it's putting a
> >     guard byte between each one and marking that byte poisoned in the
> funky shadow
> >     map thing?
> >
> > aiui, yes, asan goes out of its way to make the "highly improbable"
> easily
> > reproducible. (because, yes, when the _visible_ part of your ecosystem
> is 3bn
> > users [and you can't even see the rest of the iceberg], unlikely shit
> happens
> > daily. best to catch it before it ships!)
>
> Agreed. In this case, it's not number-of-users it's number-of-build
> permutations, but eh... that just makes it easier to hide?
>
> A problem that "now I've seen it, fixing it later would only cost ME 5
> minutes
> if it did eventually happen" doesn't mean some poor developer in vietnam
> hitting
> this because they added a printf() to a DIFFERENT COMMAND wouldn't lose a
> week,
> and I could always get hit by a bus, and there's "long tail" support far
> enough
> in the future that domain expertise gets a bit wobbly, and...
>
> If it can happen, expect it. Murphy was an optimist, just not evenly
> distributed...
> >     Note that it triggered on gcc with glibc, and I haven't looked at
> glibc's memcmp
> >     source because gnu and gplv3 and ew. But also because it doesn't
> matter because
> >     I dunno what implementation they'll switch to in future, and in
> THEORY doing
> >     readahead is always a possibility (Bionic's neon version claims to
> be abusing
> >     the FPU to do 32 _byte_ comparisons), and in THEORY you could fall
> off the edge
> >     of a segment doing that.
> >
> > for arm64, the SVE memcmp() will load as many bytes as your vector size
> :-)
>
> Which is not optimizing for the common case, but ok...
>

as a libc maintainer, "don't get me started". the number of times i've had
optimized memory/string routines that are improvements for the very large
cases that mostly only happen in microbenchmarks while regressing the more
common short copies/compares... (though given the arm64 SVE context, i
should say that i think "arm ltd" themselves might be the sole exception
that's never wasted my time with such a thing.)

> > interestingly, there seem to be new instructions for non-faulting loads,
> > specifically so you can do this kind of
> > thing:
> https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/LDNF1B--Contiguous-load-non-fault-unsigned-bytes-to-vector--immediate-index--
> <
> https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/LDNF1B--Contiguous-load-non-fault-unsigned-bytes-to-vector--immediate-index--
> >
>
> Further increasing complexity to mitigate the fallout from a previous
> unnecessary optimization is not my preferred approach, I tend to rip OUT
> stuff
> with sharp edges and little to no benefit. But to each their own...
>

this kind of thing is what lets you do things like adding fake cat eats to
your head live when you're recording stupid videos to clog the intertubes
with. oddly to you and me, that's an in-demand use case for "real people"...

> Rob
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20230105/aa7fb687/attachment.htm>