[Toybox] [PATCH] timeout: fix exit status for sneaky subprocesses.
enh
enh at google.com
Sun Sep 15 16:54:18 PDT 2019
On Fri, Sep 13, 2019 at 7:59 PM Rob Landley <rob at landley.net> wrote:
>
> On 9/13/19 5:36 PM, enh via Toybox wrote:
> > There's nothing to stop a subprocess from catching our SIGTERM on timeout
> > and exiting, causing us to incorrectly report that it didn't time out.
> > Android's ART has a utility that does exactly this.
> >
> > Explicitly catch this case, and add corresponding tests.
>
> Don't need the first test, incrementing TT.signaled twice shouldn't hurt
> anything. And actual we can just set toys.exitval to 128 + the signal value we
> sent and then only reset it in the exit path if it's zero or the preserve sig
> thing happens?
>
> This looks like it will always return 124 even when it did SIGKILL because you
> then overwrite the result?
yeah, they even mention the SIGKILL special case to the special case
to the special case in the man page:
If the command times out, and --preserve-status is not set, then exit
with status 124. Otherwise, exit with the status of COMMAND. If no
signal is specified, send the TERM signal upon timeout. The TERM sig‐
nal kills any process that does not block or catch that signal. It may
be necessary to use the KILL (9) signal, since this signal cannot be
caught, in which case the exit status is 128+9 rather than 124.
> There's a race condition where the command can exit for non-signal reasons
> before the signal is delivered (especially on SMP).
if you're that close to the timeout, i think calling it a timeout is
arguably _more_ fair than letting you get away with it :-)
> Also,
>
> else if (WTERMSIG(status)==SIGKILL) toys.exitval = 137;
> else toys.exitval = FLAG(preserve_status) ? 128+WTERMSIG(status) : 124;
>
> SIGKILL is 9, 128+9 is 137, why are we special casing this again?
>
> Also, lib/xwrap.c has xwaitpid() which looks like it's already doing most of
> this, except:
>
> while (-1 == waitpid(pid, &status, 0) && errno == EINTR);
> return WIFEXITED(status) ? WEXITSTATUS(status) : WTERMSIG(status)+127;
>
> That's using +127 and we're using +128 here? Who are the existing users of
> xwaitpid... xpclose() and tar.c, neither of which have yet had a chance to care
> what this value is (other than >127) so the lib function is probably wrong.
$ grep -r WTERMSIG
lib/xwrap.c: return WIFEXITED(status) ? WEXITSTATUS(status) :
WTERMSIG(status)+127;
toys/posix/xargs.c: status = WIFEXITED(status) ?
WEXITSTATUS(status) : WTERMSIG(status)+127;
toys/posix/time.c: toys.exitval = WIFEXITED(stat) ?
WEXITSTATUS(stat) : WTERMSIG(stat);
toys/other/watch.c: status = WIFEXITED(status) ? WEXITSTATUS(status)
: WTERMSIG(status)+127;
toys/other/timeout.c: else if (WTERMSIG(status)==SIGKILL) toys.exitval = 137;
toys/other/timeout.c: else toys.exitval = FLAG(preserve_status) ?
128+WTERMSIG(status) : 124;
toys/pending/fsck.c: if (WTERMSIG(status) != SIGINT)
toys/pending/fsck.c: perror_msg("child Term. by sig:
%d\n",(WTERMSIG(status)));
toys/pending/tcpsvd.c: xprintf("%s: end %d signaled
%d\n",toys.which->name, pid_n, WTERMSIG(status));
huh. other than timeout, it looks like we don't have tests for any of
these, so (given TEST_HOST=1 passes) my money's on timeout atm.
i've added "write missing tests for exit statuses and fix the bugs" to
my to-do list...
> I have a cold, none of this is making sense right now. Right, tests:
>
> $ timeout -s 10 1 cat
> $ echo $?
> 124
> $ timeout -s 10 --preserve-status 1 cat
> $ echo $?
> 138
> $ timeout -s 10 -k 2 --preserve-status 1 dd
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 0.99923 s, 0.0 kB/s
> Killed
> $ echo $?
> 137
> $ timeout -s 10 -k 2 1 dd
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 0.999119 s, 0.0 kB/s
> Killed
> $ echo $?
> 137
> $ (sleep 2; killall -SIGBUS dd) & timeout -s 10 1 dd
> [1] 17701
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 0.999222 s, 0.0 kB/s
> $ echo $?
> 124
> [1]+ Done ( sleep 2; killall -SIGBUS dd )
> $ (sleep 2; killall -SIGBUS dd) & timeout -s 10 --preserve-status 1 dd
> [1] 17780
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 0.998975 s, 0.0 kB/s
> $ echo $?
> 135
> [1]+ Done ( sleep 2; killall -SIGBUS dd )
> $ (sleep 2) | timeout -s 10 1 dd
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 0.999297 s, 0.0 kB/s
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 1.999 s, 0.0 kB/s
> $ echo $?
> 124
> $ (sleep 2) | timeout -s 10 --preserve-status 1 dd
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 0.999241 s, 0.0 kB/s
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 1.99905 s, 0.0 kB/s
> $ echo $?
> 0
>
> SIGBUS is 7, 128+7 = 135 so that's right. It looks like --preserve-status
> unconditionally overrides our thing, so we return the signal we SENT in the
> absence of --preserve-status.
>
> So, if they catch the signal but then die for other reasons, the exit code
> indicates the signal that was sent unless we --preserve-status...
yeah, i got that wrong until i added the tests.
> Rob
More information about the Toybox
mailing list