[Toybox] xmemcmp()

enh enh at google.com
Tue Jan 3 18:00:27 PST 2023


On Mon, Jan 2, 2023 at 11:20 AM Rob Landley <rob at landley.net> wrote:

> On 1/1/23 18:06, Ray Gardner wrote:
> > Some questions and comments on the memcmp() issue from the 2022-12-23
> notes.html
> > entry.
> >
> > I suspect that memcmp("a", "abc", 4); is a simplified version of the
> actual case
> > that caused an issue with ASAN.
>
> Yes, but the simplified version triggered gcc/glibc's ASAN for me.
>
> > What was the actual case like?
>
> toys/*/sh.c line 3080-ish:
>
>       // "for" on its own line is an error.
>       if (arg->c == 1 && ex && !memcmp(ex, "do\0A", 4)) {
>         s = "newline";
>         goto flush;
>       }
>
> There are multiple "do" cases that need to match the keyword to trigger
> parser
> state advancement, but they have an extra byte appended after the NUL to
> distinguish while/do for/do select/do when the parser uses the entry on
> the top
> of the stack to determine parsing context. We expect a "do" statement to
> end
> this, but "for ((a;b;c;)); do" is very different from "while true; do".
>
> The ex = *expect ? (*expect)->prev->data : 0; it's comparing against is the
> string constant that actually got pushed onto the expect stack this time
> around:
>
>       (*expect)->prev->data = "do\0C";
>       dlist_add(expect, (*s == 'c') ? "esac" : "do\0A");
>
>     end = 0;
>     if (!strcmp(s, "if")) end = "then";
>     else if (!strcmp(s, "while") || !strcmp(s, "until")) end = "do\0B";
>     else if (!strcmp(s, "{")) end = "}";
>     else if (!strcmp(s, "(")) end = ")";
>     else if (!strcmp(s, "[[")) end = "]]";
>
>         dlist_add(expect, end);
>
> And so on. Any string constant starting with "do" will have a fourth byte,
> and
> any other string constant is going to diverge within the bounds of the
> actual
> string constant so we're _going_ to hit inequality before falling off the
> end.
>

isn't your function _strncmp()_ rather than memcmp() though?

even though one _could_ write a byte-by-byte memcmp(), the standard does
not require that, and i'm aware of no non-C implementation that works that
way. (musl may have misled you here? strictly BSD also has a memcmp.c
that's byte-by-byte, but all the real architectures have assembler versions
they use instead.)


> > Was part of
> > either buffer in unallocated memory? If so, could that memory be
> accessed in
> > some other scenario?
>
> If compile time string constants become unallocated there's something
> wrong with
> the ELF loader or we did something really FUNKY with mmap()...
>
> > xmemcmp() doesn't do the same thing as memcmp(). memcmp() compares char
> as
> > unsigned, and xmemcmp uses plain char *, so if plain char is signed by
> > default, xmemcmp will return the opposite "sense" from memcmp for e.g.
> > xmemcmp("0xF0", "a", 2).
>
> https://landley.net/toybox/design.html:#:~:text=Signedness%20of%20char
>
> https://github.com/landley/toybox/blob/master/configure#L16


my confusion with the xmemcmp() name is (a) that this isn't really a
memcmp() because you want to guarantee that you don't read past a mismatch
[because it might be a NUL because you're actually dealing with strings],
and (b) that everywhere (?) else in toybox the leading `x` means "or exit".
which this doesn't do.


>
> > Easy fix is to use
> >
> >   while (len--) if ((ii = *(unsigned char *)one++ - *(unsigned char
> *)two++)) break;
> >
> > It appears memcmp() is only used for equality testing now, but if you
> leave
> > xmemcmp() unchanged it could cause trouble later if it's used for
> > less/equal/greater comparison.
> >
> > As for ASAN, you can disable the catching of memcmp.
>
> Android has a test suite they need stuff to pass to sleep at night. I
> don't have
> to agree with it, I can add compliance stuff to make them happy.
>
> Every checker throws the occasional false positive. ASAN is far far far
> less
> annoying than any static checker I've ever tried to use. Compared to the
> years I
> spent arguing with gcc's "is never actually used uninitialized" warnings,
> this
> is a rounding error.
>
> > I know nothing of toybox or
> > linux or ASAN, so I may be off base here,
>
> The design page linked above is a bit verbose, but I dunno how to make it
> shorter and still say enough stuff. I need to do a refresh pass on
> https://landley.net/code.html (which was never actually _finished_, but
> the
> project's a moving target).
>
> > No ASAN memcmp testing is done if intercept_memcmp is 0.
>
> Leaving aside "I already fixed it" and instances like
> https://github.com/landley/toybox/commit/472599b99bec where I may grumble
> but I
> do actually check in even completely useless changes to mollify the
> sanitizer,
> in this case the chance of an actual bug is nonzero. (Merely
> astronomically small.)
>
> Bionic does seem to do the "optimized" version where it's doing word sized
> reads. Well, MAYBE it isn't here because
> libc/arch-arm/generic/bionic/memcmp.S has:
>
>         /* make sure we have at least 8+4 bytes, this simplify things below
>          * and avoid some overhead for small blocks
>          */
>         cmp        r2, #(8+4)
>         bmi        10f
>

yeah, all our many memcmps will do the largest reads they can. so arm64
will start with 16 bytes/load but get down to 1 if it needs to. but that 4
would mean it wouldn't do less than a 4-byte load. (you'd have gotten the
behavior you were hoping for if your constant had been 3 though!)


> But I up for tracing the flow of assembly that uses hexadecimal constants
> for
> its jumps. (Checking in generated code is technical debt even _before_ you
> hand-modify it.)
>
> But assuming it's doing word sized reads even when presented with unaligned
> input, it _is_ theoretically possible that somebody could use the FDPIC
> loader
> for security reasons on a system with MMU (it's ASLR on steroids, the
> security
> nuts who pull that sort of thing tend not to talk about it), and if our
> shorter
> constant is the last entry in the entire rodata segment (I dunno how
> llvm/lld
> orders rodata entries) and the last page is exactly full, and the string
> we're
> comparing against is "{" (3 bytes), you could theoretically have the word
> sized
> read go up to two bytes off the end and fault.
>
> Android shipping billions of devices with half a gig of DRAM and no ECC
> memory
> is statistically going to cause WAY more problems than that, but sure. I
> can use
> a version that _explicitly_ does byte at a time comparisons.
>
> > Not sure if turning off
> > strict_memcmp will shut up the error you encountered,
>
> But that's not their test.
>

indeed. not least because we use hwasan :-)


> > but it does for the case
> > above. You can also compile the options into the source (see
> > https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
> >
> > const char *__asan_default_options() {
> >   return "intercept_memcmp=1:strict_memcmp=0";
> > }
>
> I posted here christmas eve about how I had to do that to shut off the
> memory
> leak detector, because gcc was ignoring the environment variable.
>
> > ASAN may be overly paranoid with strict_memcmp, but I think it's looking
> (with
> > strict) to see if any of the two buffers fall in unallocated (or even
> just
> > uninitialized? how would it know?) memory.
>
> It cares that we go off the end of a string constant. At a guess it's
> putting a
> guard byte between each one and marking that byte poisoned in the funky
> shadow
> map thing?
>

aiui, yes, asan goes out of its way to make the "highly improbable" easily
reproducible. (because, yes, when the _visible_ part of your ecosystem is
3bn users [and you can't even see the rest of the iceberg], unlikely shit
happens daily. best to catch it before it ships!)


> Note that it triggered on gcc with glibc, and I haven't looked at glibc's
> memcmp
> source because gnu and gplv3 and ew. But also because it doesn't matter
> because
> I dunno what implementation they'll switch to in future, and in THEORY
> doing
> readahead is always a possibility (Bionic's neon version claims to be
> abusing
> the FPU to do 32 _byte_ comparisons), and in THEORY you could fall off the
> edge
> of a segment doing that.
>

for arm64, the SVE memcmp() will load as many bytes as your vector size :-)
interestingly, there seem to be new instructions for non-faulting loads,
specifically so you can do this kind of thing:
https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/LDNF1B--Contiguous-load-non-fault-unsigned-bytes-to-vector--immediate-index--


> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20230103/90914a86/attachment.htm>


More information about the Toybox mailing list