[Toybox] warning: char *s is never used uninitialized.

Thu Jan 9 20:46:11 PST 2020

On 1/9/20 12:10 PM, enh wrote:
> On Mon, Dec 23, 2019 at 9:32 PM Rob Landley <rob at landley.net> wrote:
>>
>> Right now toys/pending/sh.c has two legitimate "unused variable" warnings
>> (because commented out unfinished code block, it's very much a work in
>> progress), but this:
>>
>> toys/pending/sh.c: In function 'run_command.constprop':
>> toys/pending/sh.c:596:5: warning: 's' may be used uninitialized in this function
>> [-Wmaybe-uninitialized]
>>      syntax_err("bad %s", s);
>>      ^~~~~~~~~~~~~~~~~~~~~~~
>> toys/pending/sh.c:442:9: note: 's' was declared here
>>    char *s, *ss, *sss, *cv = 0;
>>
>> Ummm... how? The code there boils down to:
>>
>>   char *s;
>>
>>   if (envlen>=arg->c) return 0;
>>   for (j = envlen; j<arg->c; j++) {
>>     s = arg->v[j];
>>     stuff();
>>   }
>>   if (j != arg->c) syntax_err("bad %s", s);
>>
>> If we didn't go into the loop at least once, we don't call syntax_err(). If it's
>>> = the limit we returned early, if it's less we went into the loop at least once.
>>
>> So no matter what value of envlen and arg->c are passed in, HOW can this be used
>> uninitialized? (Anybody else spot it?)
> 
> this is quite common that the compiler won't fully understand this
> kind of "duplicated condition" stuff.

This is why I've added so many "int s = s;" over the years. (A trick I picked up
from linux-kernel.) Sigh. I'm reluctant to do more of that because they're hard
to find after the fact. Maybe I could do some sort of "int SQUISH(s);" macro to
highlight when I'm doing that?

> it's a trade off of how much
> time you want the compiler to spend analysing your code. the
> clang-tidy static analyser might well be able to spot it, but that's
> far too slow for you to use day to day. (Java has a very simple
> algorithm for this, explicitly stated in the JLS, so ironically Java
> is more C-like in its predictability here than C, which leaves so much
> to "quality of implementation".)
> 
> speaking of which, we're about to turn on
> https://reviews.llvm.org/rL349442 for Android to at least remove one
> area of undefined-ness.

Java also memsets each new object as it's allocated, which is why I avoided ever
creating objects in performance critical code. (Doing a memset on the locals in
every function call sounds like it's gonna suck, but it's your performance
taking the hit...)

> toybox isn't even on the list of code we're
> actually worried about (and it's actually information leaks that are
> the security folks' concern, not flaky failures caused by
> uninitialized memory), but the recent ls(1) issues for example would
> have been consistent failures that wouldn't have made it through
> presubmit with this feature turned on, rather than "depends what
> happened to be in that uninitialized memory".

Um, _which_ recent ls issues? The one lasting from November 8 to November 12,
the segfault when wcrtomb() unexpectedly returned -1... do you mean 047bcb8e7d37
back in 2015 where yes the loop was reading uninitalized memory in a properly
allocated array, but it never used the result (the test was in the consumer, not
in the producer) so literally could never cause any problem except for the
sanitizer freaking out?

*shrug* I'm all for finding stuff, let me know if it results in patches I need
to apply...

> see also
> https://outflux.net/slides/2018/lss/danger.pdf for a kernel
> perspective.

read read...

memcpy() does have a length argument. You feed it a length. It does exactly what
you told it to do. Does he mean strcpy()? There's a strncpy(). Which nobody uses
because some idiot decided it should memset all the unused memory at the end,
which you only ever want when writing encryption software that shouldn't emit
side-channel info via power consumption and such.

VLAs are terrible, yes. (I use the old c89 "array at end of struct, do your own
math" trick.)

I hardly ever use switches, although that's partly because I benchmarked years
ago and found if/else staircases were smaller. (Probably changed since but habit...)

fall through: No, I REFUSE to ever have a PARSED COMMENT. I'm aware that the
kernel guys made "# CONFIG_BLAH is not set" meaningful and it was DUMB and they
should be ASHAMED OF THEMSELVES and I am NOT GOING THERE.

On sh4 integer overflow can generate an interrupt, depending on what you set the
status register to. This isn't a compiler thing, this is a "do I want this to
fault or not at runtime" thing. (You can catch the signal and do cleanup if you
really want to, just like division by zero...) The compiler forcing me to
typecast pointers to longs in order to do pointer arithmetic between things it
thinks are in different "objects" is compiler devs thinking they know better
than I do what my code should do and objecting to how I use the language. Grrr.

(I am not impressed by their inline function and/or macro to do the test.)

What implementations of strlcpy read source beyond destination size? They
_shouldn't_...

You don't "take and update remaining destination size", you s = stpncpy(s) and
then do pointer arithmetic to see how much space you have left.

That -fsanitize=cfi thing will break toybox in like 30 places. (I plan to redo
how the dlist stuff works, but haven't yet because it's an intrusive change and
my tree is hardly ever clean. Then it would only break it in like 15 places.)

Rob