[Toybox] xmemcmp()

Rob Landley rob at landley.net
Mon Jan 2 11:32:16 PST 2023


On 1/1/23 18:06, Ray Gardner wrote:
> Some questions and comments on the memcmp() issue from the 2022-12-23 notes.html
> entry.
> 
> I suspect that memcmp("a", "abc", 4); is a simplified version of the actual case
> that caused an issue with ASAN.

Yes, but the simplified version triggered gcc/glibc's ASAN for me.

> What was the actual case like?

toys/*/sh.c line 3080-ish:

      // "for" on its own line is an error.
      if (arg->c == 1 && ex && !memcmp(ex, "do\0A", 4)) {
        s = "newline";
        goto flush;
      }

There are multiple "do" cases that need to match the keyword to trigger parser
state advancement, but they have an extra byte appended after the NUL to
distinguish while/do for/do select/do when the parser uses the entry on the top
of the stack to determine parsing context. We expect a "do" statement to end
this, but "for ((a;b;c;)); do" is very different from "while true; do".

The ex = *expect ? (*expect)->prev->data : 0; it's comparing against is the
string constant that actually got pushed onto the expect stack this time around:

      (*expect)->prev->data = "do\0C";
      dlist_add(expect, (*s == 'c') ? "esac" : "do\0A");

    end = 0;
    if (!strcmp(s, "if")) end = "then";
    else if (!strcmp(s, "while") || !strcmp(s, "until")) end = "do\0B";
    else if (!strcmp(s, "{")) end = "}";
    else if (!strcmp(s, "(")) end = ")";
    else if (!strcmp(s, "[[")) end = "]]";

        dlist_add(expect, end);

And so on. Any string constant starting with "do" will have a fourth byte, and
any other string constant is going to diverge within the bounds of the actual
string constant so we're _going_ to hit inequality before falling off the end.

> Was part of
> either buffer in unallocated memory? If so, could that memory be accessed in
> some other scenario?

If compile time string constants become unallocated there's something wrong with
the ELF loader or we did something really FUNKY with mmap()...

> xmemcmp() doesn't do the same thing as memcmp(). memcmp() compares char as
> unsigned, and xmemcmp uses plain char *, so if plain char is signed by
> default, xmemcmp will return the opposite "sense" from memcmp for e.g.
> xmemcmp("0xF0", "a", 2).

https://landley.net/toybox/design.html:#:~:text=Signedness%20of%20char

https://github.com/landley/toybox/blob/master/configure#L16

> Easy fix is to use
> 
>   while (len--) if ((ii = *(unsigned char *)one++ - *(unsigned char *)two++)) break;
> 
> It appears memcmp() is only used for equality testing now, but if you leave
> xmemcmp() unchanged it could cause trouble later if it's used for
> less/equal/greater comparison.
> 
> As for ASAN, you can disable the catching of memcmp.

Android has a test suite they need stuff to pass to sleep at night. I don't have
to agree with it, I can add compliance stuff to make them happy.

Every checker throws the occasional false positive. ASAN is far far far less
annoying than any static checker I've ever tried to use. Compared to the years I
spent arguing with gcc's "is never actually used uninitialized" warnings, this
is a rounding error.

> I know nothing of toybox or
> linux or ASAN, so I may be off base here,

The design page linked above is a bit verbose, but I dunno how to make it
shorter and still say enough stuff. I need to do a refresh pass on
https://landley.net/code.html (which was never actually _finished_, but the
project's a moving target).

> No ASAN memcmp testing is done if intercept_memcmp is 0.

Leaving aside "I already fixed it" and instances like
https://github.com/landley/toybox/commit/472599b99bec where I may grumble but I
do actually check in even completely useless changes to mollify the sanitizer,
in this case the chance of an actual bug is nonzero. (Merely astronomically small.)

Bionic does seem to do the "optimized" version where it's doing word sized
reads. Well, MAYBE it isn't here because libc/arch-arm/generic/bionic/memcmp.S has:

        /* make sure we have at least 8+4 bytes, this simplify things below
         * and avoid some overhead for small blocks
         */
        cmp        r2, #(8+4)
        bmi        10f

But I up for tracing the flow of assembly that uses hexadecimal constants for
its jumps. (Checking in generated code is technical debt even _before_ you
hand-modify it.)

But assuming it's doing word sized reads even when presented with unaligned
input, it _is_ theoretically possible that somebody could use the FDPIC loader
for security reasons on a system with MMU (it's ASLR on steroids, the security
nuts who pull that sort of thing tend not to talk about it), and if our shorter
constant is the last entry in the entire rodata segment (I dunno how llvm/lld
orders rodata entries) and the last page is exactly full, and the string we're
comparing against is "{" (3 bytes), you could theoretically have the word sized
read go up to two bytes off the end and fault.

Android shipping billions of devices with half a gig of DRAM and no ECC memory
is statistically going to cause WAY more problems than that, but sure. I can use
a version that _explicitly_ does byte at a time comparisons.

> Not sure if turning off
> strict_memcmp will shut up the error you encountered,

But that's not their test.

> but it does for the case
> above. You can also compile the options into the source (see
> https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
>
> const char *__asan_default_options() {
>   return "intercept_memcmp=1:strict_memcmp=0";
> }

I posted here christmas eve about how I had to do that to shut off the memory
leak detector, because gcc was ignoring the environment variable.

> ASAN may be overly paranoid with strict_memcmp, but I think it's looking (with
> strict) to see if any of the two buffers fall in unallocated (or even just
> uninitialized? how would it know?) memory.

It cares that we go off the end of a string constant. At a guess it's putting a
guard byte between each one and marking that byte poisoned in the funky shadow
map thing?

Note that it triggered on gcc with glibc, and I haven't looked at glibc's memcmp
source because gnu and gplv3 and ew. But also because it doesn't matter because
I dunno what implementation they'll switch to in future, and in THEORY doing
readahead is always a possibility (Bionic's neon version claims to be abusing
the FPU to do 32 _byte_ comparisons), and in THEORY you could fall off the edge
of a segment doing that.

Rob


More information about the Toybox mailing list