[Toybox] [PATCH 2/2] Detect all errors with xreadall/xwrite.

Sat Aug 14 05:11:42 PDT 2021

On Mon, Aug 09, 2021 at 04:35:37AM -0500, Rob Landley wrote:
> > If xreadall is called with SIZE_MAX as length, then no error is
> > detected.
> 
> Sounds like xreadall should be using int len.

How does a signed type help? If at all, it makes things worse if a
supplied unsigned type is expanded as signed. It makes it even easier
to trigger the issue.

> > The problem is that readall return value is not explicitly tested for
> > -1. If len is SIZE_MAX and readall returns -1, then the return value is
> > promoted to size_t and both values are considered to be equal.
> 
> Define "invalid"?

My definition of invalid is: No data is read but the function returns
without error indication. A caller cannot detect the error at all.

> $ tar tvf poc.tar
> tar: Archive value -1 is out of off_t range 0..9223372036854775807
> tar: This does not look like a tar archive
> tar: Skipping to next header
> tar: Exiting with failure status due to previous errors
> 
> That's with gnu tar on the host. Note that it doesn't actually _try_ to read,
> this check is in the tar header field decode logic, so fiddling with lib/ in
> toybox isn't the right place to fix that for toybox tar either.
> 
> But again, none of that is the lib/ code.

> You're just arguing about where the bad things kick in, not eliminating them.

I did not report tar issues first because tar is my proof of concept
that toys in toybox can be protected from out of boundary memory
accesses by fixing a wrapper method in lib/ which does not do what it
is supposed to do: Exit if read operation fails.

You can and should carefully review all toys to make sure that they do
not misbehave, but this is an easy adjustment which helps many toys.
The proof of concept becomes a proof of concept that protected library
functions also protect against mistakes in toys.

And I see significant difference between an error code return and an out
of boundary access. I hope that we have a common ground on this one.

It is too difficult for me to figure out the direction in which the
toybox design moves and where it comes from and which kind of errors are
accepted or even embraced (like data types, signed overflows).

So here are my mere bug reports without patches. I still hope that they
are helpful to you:

1) Toybox compiled with -fsanitize=undefined runs into signed overflows:

$ toybox tar
lib/args.c:364:22: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
lib/args.c:364:53: runtime error: shift exponent 32 is too large for 32-bit type 'int'
tar: Needs -txc (see "tar --help")

2) sparse calculation in tar.c triggers signed overflows

The TT.hdr.ssize calculation can overflow if offset and size are
very large.

3) is_tar_header in lib/lib.c can consider invalid headers as valid

Using sscanf can parse past the checksum field if it is filled with
spaces. The code cannot overrun the tar header boundary at least
because "ustar" is a protective barrier.

4) TT.hdr.name handling for 'x' headers leaks memory

Assigning TT.hdr.name with allocated memory multiple times makes it
impossible to release the memory afterwards. The same is true for 'K'
and 'L' etc.

5) Almost an issue: Out of boundary access on huge 'x' headers

64 bit systems have no problem to allocate and read more than 2 GB
for a header. But %n in sscanf can turn negative if the line is long
enough. The subsequent xstrdup(p+n) can access memory in front of p.
This is prevented due to data types in use currently. It should be
taken care of to be protected in the future. It is also the reason why
I wanted to prepare the library code before going any further.

I have stopped at this point and offered the library patches to see if
there is any interest in them. I have not resumed any further review
activity because it seems to me that the choice of how to fix issues
is currently too uncertain.

Sincerely,
Samanta