<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023 at 10:42 AM Rob Landley <<a href="mailto:rob@landley.net">rob@landley.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 1/3/23 20:00, enh wrote:<br>
> On Mon, Jan 2, 2023 at 11:20 AM Rob Landley <<a href="mailto:rob@landley.net" target="_blank">rob@landley.net</a><br>
> <mailto:<a href="mailto:rob@landley.net" target="_blank">rob@landley.net</a>>> wrote:<br>
<br>
I'm in text mode, thunderbird. Text mode. You should not be doing in-band regex<br>
matching to expand html tags in TEXT MODE.<br>
<br>
I had a long email thread with a journalist last month about how the apache<br>
foundation is a proper open source umbrella organization, but the mozilla<br>
organization is the residue of a money-centered silicon valley startup (netscape<br>
dot-com) that has nonprofit status the same way megachurches do.<br>
<br>
Not a unique obeservation, of course:<br>
<a href="https://mastodon.ar.al/@aral/109551499671511513" rel="noreferrer" target="_blank">https://mastodon.ar.al/@aral/109551499671511513</a><br>
<br>
> (*expect)->prev->data = "do\0C";<br>
> dlist_add(expect, (*s == 'c') ? "esac" : "do\0A");<br>
> <br>
> end = 0;<br>
> if (!strcmp(s, "if")) end = "then";<br>
> else if (!strcmp(s, "while") || !strcmp(s, "until")) end = "do\0B";<br>
> else if (!strcmp(s, "{")) end = "}";<br>
> else if (!strcmp(s, "(")) end = ")";<br>
> else if (!strcmp(s, "[[")) end = "]]";<br>
> <br>
> dlist_add(expect, end);<br>
> <br>
> And so on. Any string constant starting with "do" will have a fourth byte, and<br>
> any other string constant is going to diverge within the bounds of the actual<br>
> string constant so we're _going_ to hit inequality before falling off the end.<br>
> <br>
> isn't your function _strncmp()_ rather than memcmp() though?<br>
<br>
No, I care about the A/B/C in position 4 matching too. That's how I distinguish<br>
the three "do" cases. A strncmp() will stop at the NUL and go "yup, matched". I<br>
want it to check the fourth byte IF the first three bytes matched, which is<br>
memcmp(). Return the sign of the difference for the first nonmatching byte pair<br>
within range, zero if the whole range matched.<br>
<br>
> even though one _could_ write a byte-by-byte memcmp(), the standard does not<br>
> require that, and i'm aware of no non-C implementation that works that way.<br>
<br>
A non-C implementation of a C library function?<br></blockquote><div><br></div><div>well, "assembler" if you must. my distinction being "regardless of architecture" (so not specifically arm64 or whatever).</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> (musl may have misled you here? strictly BSD also has a memcmp.c that's<br>
> byte-by-byte, but all the real architectures have assembler versions they use<br>
> instead.)<br>
<br>
I made one that works the way I expect and switched all the calls to it. (If I<br>
need to repeatedly memcmp() something I expect to remain equal for longer than a<br>
few cache lines, I question my algorithm. And a byte-by-byte tight loop in L1<br>
cache with multiple execution units and branch predicted pipeline reordering<br>
isn't gonna be _that_ slow.)<br>
<br>
> my confusion with the xmemcmp() name is (a) that this isn't really a memcmp()<br>
> because you want to guarantee that you don't read past a mismatch [because it<br>
> might be a NUL because you're actually dealing with strings], and (b) that<br>
> everywhere (?) else in toybox the leading `x` means "or exit". which this<br>
> doesn't do.<br>
<br>
Sigh: naming things, cache invalidation, and off by one errors.<br>
<br>
I agree that xmemcmp() is not the ideal name. The x prefix means "exits", and<br>
this doesn't.<br>
<br>
memscmp() maybe? (memstrcmp?)<br></blockquote><div><br></div><div>safememcmp()?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> Leaving aside "I already fixed it" and instances like<br>
> <a href="https://github.com/landley/toybox/commit/472599b99bec" rel="noreferrer" target="_blank">https://github.com/landley/toybox/commit/472599b99bec</a><br>
> <<a href="https://github.com/landley/toybox/commit/472599b99bec" rel="noreferrer" target="_blank">https://github.com/landley/toybox/commit/472599b99bec</a>> where I may grumble<br>
> but I<br>
> do actually check in even completely useless changes to mollify the sanitizer,<br>
> in this case the chance of an actual bug is nonzero. (Merely astronomically<br>
> small.)<br>
> <br>
> Bionic does seem to do the "optimized" version where it's doing word sized<br>
> reads. Well, MAYBE it isn't here because<br>
> libc/arch-arm/generic/bionic/memcmp.S has:<br>
> <br>
> /* make sure we have at least 8+4 bytes, this simplify things below<br>
> * and avoid some overhead for small blocks<br>
> */<br>
> cmp r2, #(8+4)<br>
> bmi 10f<br>
> <br>
> yeah, all our many memcmps will do the largest reads they can. so arm64 will<br>
> start with 16 bytes/load but get down to 1 if it needs to. but that 4 would mean<br>
> it wouldn't do less than a 4-byte load. (you'd have gotten the behavior you were<br>
> hoping for if your constant had been 3 though!)<br>
<br>
There are many circumstances under which I would not have hit this, true.<br>
<br>
"Optimization broke the obvious semantics" is not exactly a _new_ story, nor is<br>
my reaction to it. (30 years later can still be "premature".)<br>
<br>
[Rant cut and pasted to the blog.]<br>
<br>
> > Not sure if turning off<br>
> > strict_memcmp will shut up the error you encountered,<br>
> <br>
> But that's not their test.<br>
> <br>
> indeed. not least because we use hwasan :-)<br>
<br>
And even if you didn't yet, you could start in future...<br>
<br>
> > but it does for the case<br>
> > above. You can also compile the options into the source (see<br>
> > <a href="https://github.com/google/sanitizers/wiki/AddressSanitizerFlags" rel="noreferrer" target="_blank">https://github.com/google/sanitizers/wiki/AddressSanitizerFlags</a><br>
> <<a href="https://github.com/google/sanitizers/wiki/AddressSanitizerFlags" rel="noreferrer" target="_blank">https://github.com/google/sanitizers/wiki/AddressSanitizerFlags</a>><br>
> ><br>
> > const char *__asan_default_options() {<br>
> > return "intercept_memcmp=1:strict_memcmp=0";<br>
> > }<br>
> <br>
> I posted here christmas eve about how I had to do that to shut off the memory<br>
> leak detector, because gcc was ignoring the environment variable.<br>
> <br>
> > ASAN may be overly paranoid with strict_memcmp, but I think it's looking (with<br>
> > strict) to see if any of the two buffers fall in unallocated (or even just<br>
> > uninitialized? how would it know?) memory.<br>
> <br>
> It cares that we go off the end of a string constant. At a guess it's putting a<br>
> guard byte between each one and marking that byte poisoned in the funky shadow<br>
> map thing?<br>
> <br>
> aiui, yes, asan goes out of its way to make the "highly improbable" easily<br>
> reproducible. (because, yes, when the _visible_ part of your ecosystem is 3bn<br>
> users [and you can't even see the rest of the iceberg], unlikely shit happens<br>
> daily. best to catch it before it ships!)<br>
<br>
Agreed. In this case, it's not number-of-users it's number-of-build<br>
permutations, but eh... that just makes it easier to hide?<br>
<br>
A problem that "now I've seen it, fixing it later would only cost ME 5 minutes<br>
if it did eventually happen" doesn't mean some poor developer in vietnam hitting<br>
this because they added a printf() to a DIFFERENT COMMAND wouldn't lose a week,<br>
and I could always get hit by a bus, and there's "long tail" support far enough<br>
in the future that domain expertise gets a bit wobbly, and...<br>
<br>
If it can happen, expect it. Murphy was an optimist, just not evenly distributed...<br>
> Note that it triggered on gcc with glibc, and I haven't looked at glibc's memcmp<br>
> source because gnu and gplv3 and ew. But also because it doesn't matter because<br>
> I dunno what implementation they'll switch to in future, and in THEORY doing<br>
> readahead is always a possibility (Bionic's neon version claims to be abusing<br>
> the FPU to do 32 _byte_ comparisons), and in THEORY you could fall off the edge<br>
> of a segment doing that.<br>
> <br>
> for arm64, the SVE memcmp() will load as many bytes as your vector size :-)<br>
<br>
Which is not optimizing for the common case, but ok...<br></blockquote><div><br></div><div>as a libc maintainer, "don't get me started". the number of times i've had optimized memory/string routines that are improvements for the very large cases that mostly only happen in microbenchmarks while regressing the more common short copies/compares... (though given the arm64 SVE context, i should say that i think "arm ltd" themselves might be the sole exception that's never wasted my time with such a thing.)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> interestingly, there seem to be new instructions for non-faulting loads,<br>
> specifically so you can do this kind of<br>
> thing: <a href="https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/LDNF1B--Contiguous-load-non-fault-unsigned-bytes-to-vector--immediate-index--" rel="noreferrer" target="_blank">https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/LDNF1B--Contiguous-load-non-fault-unsigned-bytes-to-vector--immediate-index--</a> <<a href="https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/LDNF1B--Contiguous-load-non-fault-unsigned-bytes-to-vector--immediate-index--" rel="noreferrer" target="_blank">https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/LDNF1B--Contiguous-load-non-fault-unsigned-bytes-to-vector--immediate-index--</a>><br>
<br>
Further increasing complexity to mitigate the fallout from a previous<br>
unnecessary optimization is not my preferred approach, I tend to rip OUT stuff<br>
with sharp edges and little to no benefit. But to each their own...<br></blockquote><div><br></div><div>this kind of thing is what lets you do things like adding fake cat eats to your head live when you're recording stupid videos to clog the intertubes with. oddly to you and me, that's an in-demand use case for "real people"...</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Rob<br>
</blockquote></div></div>