[Toybox] [PATCH 2/2] Detect all errors with xreadall/xwrite.

Sun Aug 15 04:57:54 PDT 2021

On 8/14/21 7:11 AM, Samanta Navarro wrote:
> On Mon, Aug 09, 2021 at 04:35:37AM -0500, Rob Landley wrote:
> You can and should carefully review all toys to make sure that they do
> not misbehave,

Thanks, I never would have thought of that.

(I'm glad you're looking at this. Fresh eyes find fresh stuff. Always have. But
I'm a little underclocked for dealing with this right now...)

> but this is an easy adjustment which helps many toys.
> The proof of concept becomes a proof of concept that protected library
> functions also protect against mistakes in toys.

I had CONFIG_DEBUG that would insert runtime checks for things that could only
happen because of programming errors... and Android switched it on and ships
with it enabled.

Sort of ambivalent about it now.

> It is too difficult for me to figure out the direction in which the
> toybox design moves and where it comes from and which kind of errors are
> accepted or even embraced (like data types, signed overflows).

I disagree with you on a design decision, therefore I am inscrutable?

> So here are my mere bug reports without patches. I still hope that they
> are helpful to you:
> 
> 1) Toybox compiled with -fsanitize=undefined runs into signed overflows:
> 
> $ toybox tar
> lib/args.c:364:22: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
> lib/args.c:364:53: runtime error: shift exponent 32 is too large for 32-bit type 'int'
> tar: Needs -txc (see "tar --help")

Ah, commit ef67aedfe62b missed a chunk. Thanks. (Commit 368ee96fb18d.)

> 2) sparse calculation in tar.c triggers signed overflows
> 
> The TT.hdr.ssize calculation can overflow if offset and size are
> very large.

The gnu one has an explicit check and error message that the bitfield value
won't fit in the size of the data type it's trying to convert it into. That
seems like the right thing to do here? (They retrofit an ascii field to be a
bitfield when the high bit is set, and the result could store more bits than
necessary.)

By the way, I just checked "busybox tar tvf poc.tar" on your test file and it
exited with 0. They've had that command for 22 years now and this apparently
never came up.

> 3) is_tar_header in lib/lib.c can consider invalid headers as valid

That's a smoketest. It's checking at most 2 fields out of 16. There are plenty
of inputs you could feed it that would pass tar checksum but not parse as a tar
header (such as memset(buf, 0, 500); buf[148] = ' ';) but most seem unlikely to
happen coincidentally? (A zero checksum requires all the bytes outside of the 8
checksum bytes to be NUL. A nonzero checksum requires matching octal digits in
approximately the right place.)

If anything outside the checksum field is nonzero, the sscanf must eventually
get octal digits that match the checksum, and running off the end to misplace
the checksum digits would include those digits IN the checksum (it only excludes
the 8 bytes at the checksum location from the checksum calculation). Passing the
checksum would require multiple consecutive octal digits ('0' is octal 60 and
anything else is going to be higher, sscanf only skips LEADING whitespace).

The pathological case of "not valid but passes" is probably memset(buf, 0, 500);
memset(buf+148, ' ', 8); which... is sort of valid-ish? Anything more and it
MUST read octal digits.

I'm not seeing how this causes a problem, but of course I'm still sick. Yeah,
it's a cheap smoketest, but I thought that part was obvious? (Factored out so
the "file" command could use it.)

> Using sscanf can parse past the checksum field if it is filled with
> spaces. The code cannot overrun the tar header boundary at least
> because "ustar" is a protective barrier.>
> 4) TT.hdr.name handling for 'x' headers leaks memory
> 
> Assigning TT.hdr.name with allocated memory multiple times makes it
> impossible to release the memory afterwards. The same is true for 'K'
> and 'L' etc.

You mean if there are multiple 'x' entries in a row for the same file it'll leak
memory? Yeah, I haven't fully hardened this against invalid inputs yet. (Buncha
remaining todo items, but commit e1582232c388 was where I stopped keeping them
in the tree because I got email about them. My next big todo item for tar is
adding selinux support I think?)

*shrug* It frees the memory when the command exits. I try not to leak memory for
potentially infinite inputs but I admit invalid input is an edge case. I'll add
it to the todo heap. (More likely you'd want to _error_ if you have multiple
conflicting headers for long names and such? Hmmm...)

> 5) Almost an issue: Out of boundary access on huge 'x' headers

I.E. the same bounds checking issue as all the other tar fields?

I've already added a todo item to impose size limits on all 3 users of alloread.
Until a few years ago at least 2 of the 3 would have been limited to PATH_MAX by
the OS anyway, so none of them is going to be USEFULLY multiple megabytes. (In
general the older a legacy file is the less likely it is to be ginormous because
Moore's Law.)

> 64 bit systems have no problem to allocate and read more than 2 GB
> for a header.

They should never be 2 gigabytes. I'd really rather not try to support either
header type having payloads greater than 2 gigabytes. The ones based on
sendfile() can have unlimited size, the ones where you read into a memory buffer
really shouldn't.

That said, I already have a todo item to add selinux support to tar, and that
_can_ presumably be a few megabytes. (Still NOT 2 gigabytes though.)

> But %n in sscanf can turn negative if the line is long
> enough.

That's a libc error? (The argument type is _defined_ as int. Somebody wants to
errata posix about it, feel free. I still suspect trying to malloc and read
enough data into memory to trigger the OOM killer is a worse failure mode.)

> The subsequent xstrdup(p+n) can access memory in front of p.

Another reason to limit the input buffer size then.

> This is prevented due to data types in use currently.

Is it?

$ cat woot.c
#include <stdio.h>

int main(int argc, char *argv[])
{
  unsigned uu;
  int ii;

  fscanf(stdin, "%u%n", &uu, &ii);
  printf("uu=%u ii=%d\n", uu, ii);
}
$ gcc woot.c
$ ./a.out
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
uu=0 ii=105

> It should be
> taken care of to be protected in the future. It is also the reason why
> I wanted to prepare the library code before going any further.
> 
> I have stopped at this point and offered the library patches to see if
> there is any interest in them.

"That's not how I want to fix it."

"It is too difficult for me to figure out the direction in which the toybox
design moves"

Are you familiar with the old saying "don't ask questions post errors"? See also
Rule 5 of https://www.brainpickings.org/2012/09/28/neil-gaiman-8-rules-of-writing/

I honestly did not invent this way of working.

> I have not resumed any further review
> activity because it seems to me that the choice of how to fix issues
> is currently too uncertain.

Thanks for bringing up todo items. I'm always happy to get failing tests and
throw 'em on the todo heap. I don't guarantee your suggested fix is the one I'll
go with.

> Sincerely,
> Samanta

Rob