[Toybox] [PATCH] timeout: fix exit status for sneaky subprocesses.

Fri Sep 13 20:00:24 PDT 2019

On 9/13/19 5:36 PM, enh via Toybox wrote:
> There's nothing to stop a subprocess from catching our SIGTERM on timeout
> and exiting, causing us to incorrectly report that it didn't time out.
> Android's ART has a utility that does exactly this.
> 
> Explicitly catch this case, and add corresponding tests.

Don't need the first test, incrementing TT.signaled twice shouldn't hurt
anything. And actual we can just set toys.exitval to 128 + the signal value we
sent and then only reset it in the exit path if it's zero or the preserve sig
thing happens?

This looks like it will always return 124 even when it did SIGKILL because you
then overwrite the result?

There's a race condition where the command can exit for non-signal reasons
before the signal is delivered (especially on SMP).

Also,

    else if (WTERMSIG(status)==SIGKILL) toys.exitval = 137;
    else toys.exitval = FLAG(preserve_status) ? 128+WTERMSIG(status) : 124;

SIGKILL is 9, 128+9 is 137, why are we special casing this again?

Also, lib/xwrap.c has xwaitpid() which looks like it's already doing most of
this, except:

  while (-1 == waitpid(pid, &status, 0) && errno == EINTR);
  return WIFEXITED(status) ? WEXITSTATUS(status) : WTERMSIG(status)+127;

That's using +127 and we're using +128 here? Who are the existing users of
xwaitpid... xpclose() and tar.c, neither of which have yet had a chance to care
what this value is (other than >127) so the lib function is probably wrong.

I have a cold, none of this is making sense right now. Right, tests:

  $ timeout -s 10 1 cat
  $ echo $?
  124
  $ timeout -s 10 --preserve-status 1 cat
  $ echo $?
  138
  $ timeout -s 10 -k 2 --preserve-status 1 dd
  0+0 records in
  0+0 records out
  0 bytes copied, 0.99923 s, 0.0 kB/s
  Killed
  $ echo $?
  137
  $ timeout -s 10 -k 2 1 dd
  0+0 records in
  0+0 records out
  0 bytes copied, 0.999119 s, 0.0 kB/s
  Killed
  $ echo $?
  137
  $ (sleep 2; killall -SIGBUS dd) & timeout -s 10 1 dd
  [1] 17701
  0+0 records in
  0+0 records out
  0 bytes copied, 0.999222 s, 0.0 kB/s
  $ echo $?
  124
  [1]+  Done                    ( sleep 2; killall -SIGBUS dd )
  $ (sleep 2; killall -SIGBUS dd) & timeout -s 10 --preserve-status 1 dd
  [1] 17780
  0+0 records in
  0+0 records out
  0 bytes copied, 0.998975 s, 0.0 kB/s
  $ echo $?
  135
  [1]+  Done                    ( sleep 2; killall -SIGBUS dd )
  $ (sleep 2) | timeout -s 10 1 dd
  0+0 records in
  0+0 records out
  0 bytes copied, 0.999297 s, 0.0 kB/s
  0+0 records in
  0+0 records out
  0 bytes copied, 1.999 s, 0.0 kB/s
  $ echo $?
  124
  $ (sleep 2) | timeout -s 10 --preserve-status 1 dd
  0+0 records in
  0+0 records out
  0 bytes copied, 0.999241 s, 0.0 kB/s
  0+0 records in
  0+0 records out
  0 bytes copied, 1.99905 s, 0.0 kB/s
  $ echo $?
  0

SIGBUS is 7, 128+7 = 135 so that's right. It looks like --preserve-status
unconditionally overrides our thing, so we return the signal we SENT in the
absence of --preserve-status.

So, if they catch the signal but then die for other reasons, the exit code
indicates the signal that was sent unless we --preserve-status...

Rob