[Toybox] [PATCH] timeout: fix exit status for sneaky subprocesses.
enh
enh at google.com
Mon Sep 16 09:25:48 PDT 2019
On Mon, Sep 16, 2019 at 1:34 AM Rob Landley <rob at landley.net> wrote:
>
> On 9/15/19 6:54 PM, enh wrote:
> > On Fri, Sep 13, 2019 at 7:59 PM Rob Landley <rob at landley.net> wrote:
>
> I should just randomly cut a release with whatever I've got in the tree. This
> one ain't gonna quiesce, is it?
>
> Ugh, where did I leave off on this... I found a small simple change to the code,
> which I made and was happy with... and half the tests fail now.
>
> Yeah, sounds about right. I decided to wait for my cold to go away before
> circling back to it, which it mostly has now except for the part about it being
> 3am...
>
> >> This looks like it will always return 124 even when it did SIGKILL because you
> >> then overwrite the result?
> >
> > yeah, they even mention the SIGKILL special case to the special case
> > to the special case in the man page:
> >
> > If the command times out, and --preserve-status is not set, then exit
> > with status 124. Otherwise, exit with the status of COMMAND. If no
> > signal is specified, send the TERM signal upon timeout. The TERM sig‐
> > nal kills any process that does not block or catch that signal. It may
> > be necessary to use the KILL (9) signal, since this signal cannot be
> > caught, in which case the exit status is 128+9 rather than 124.
>
> Right, if we sent the kill signal exit with 128+KILL. Except it doesn't do it
> automatically, you have to specify -k.
>
> If a command catches term and then exits with a normal exit code, we still exit
> with status "we sent kill".
>
> >> There's a race condition where the command can exit for non-signal reasons
> >> before the signal is delivered (especially on SMP).
> >
> > if you're that close to the timeout, i think calling it a timeout is
> > arguably _more_ fair than letting you get away with it :-)
>
> I want deterministic behavior in nondeterministic situations. I'm aware I can't
> get it, but I'm still annoyed about it.
>
> However, the determinism here is "did timeout send a signal or not". If it did,
> we override its result with the signal we sent unless --preserve says not to. If
> we didn't, we say what it exited with. That's determinish. (It's at least
> atomic, because we're the ones calling wait4 and we know what we did last syscall.)
>
> > huh. other than timeout, it looks like we don't have tests for any of
> > these, so (given TEST_HOST=1 passes) my money's on timeout atm.
> >
> > i've added "write missing tests for exit statuses and fix the bugs" to
> > my to-do list...
>
> I'm serious I could spend literally months on the test suite. But nobody wants
> to fund that.
>
> >> I have a cold, none of this is making sense right now. Right, tests:
> >>
> >> $ timeout -s 10 1 cat
> >> $ echo $?
> >> 124
> >> $ timeout -s 10 --preserve-status 1 cat
> >> $ echo $?
> >> 138
> >> $ timeout -s 10 -k 2 --preserve-status 1 dd
> >> 0+0 records in
> >> 0+0 records out
> >> 0 bytes copied, 0.99923 s, 0.0 kB/s
> >> Killed
> >> $ echo $?
> >> 137
> >> $ timeout -s 10 -k 2 1 dd
> >> 0+0 records in
> >> 0+0 records out
> >> 0 bytes copied, 0.999119 s, 0.0 kB/s
> >> Killed
> >> $ echo $?
> >> 137
> >> $ (sleep 2; killall -SIGBUS dd) & timeout -s 10 1 dd
> >> [1] 17701
> >> 0+0 records in
> >> 0+0 records out
> >> 0 bytes copied, 0.999222 s, 0.0 kB/s
> >> $ echo $?
> >> 124
> >> [1]+ Done ( sleep 2; killall -SIGBUS dd )
> >> $ (sleep 2; killall -SIGBUS dd) & timeout -s 10 --preserve-status 1 dd
> >> [1] 17780
> >> 0+0 records in
> >> 0+0 records out
> >> 0 bytes copied, 0.998975 s, 0.0 kB/s
> >> $ echo $?
> >> 135
> >> [1]+ Done ( sleep 2; killall -SIGBUS dd )
> >> $ (sleep 2) | timeout -s 10 1 dd
> >> 0+0 records in
> >> 0+0 records out
> >> 0 bytes copied, 0.999297 s, 0.0 kB/s
> >> 0+0 records in
> >> 0+0 records out
> >> 0 bytes copied, 1.999 s, 0.0 kB/s
> >> $ echo $?
> >> 124
> >> $ (sleep 2) | timeout -s 10 --preserve-status 1 dd
> >> 0+0 records in
> >> 0+0 records out
> >> 0 bytes copied, 0.999241 s, 0.0 kB/s
> >> 0+0 records in
> >> 0+0 records out
> >> 0 bytes copied, 1.99905 s, 0.0 kB/s
> >> $ echo $?
> >> 0
>
> I should make sure the test suite has versions of all that, by the way...
>
> >> SIGBUS is 7, 128+7 = 135 so that's right. It looks like --preserve-status
> >> unconditionally overrides our thing, so we return the signal we SENT in the
> >> absence of --preserve-status.
> >>
> >> So, if they catch the signal but then die for other reasons, the exit code
> >> indicates the signal that was sent unless we --preserve-status...
> >
> > yeah, i got that wrong until i added the tests.
>
> The tests fail to pass TEST_HOST on devuan becauase loop.sh calls /bin/sh which
> is the Defective Annoying SHell which says "trap: SIGTERM: bad trap" because
> nobody should ever use dash for anything under any circumstances. Changing it to
> /bin/bash makes it work, but I dunno if that screws up your test environment.
weird. on my Debian boxes, /bin/sh -> /bin/bash. but my Ubuntu box
points to dash instead. not sure whether that's a Debian difference or
something Google did.
anyway, yeah, /bin/bash would break Android which only has /bin/sh.
i've attached a patch that uses the form that's portable to all three
shells (bash, dash, mksh).
> Working on toysh, it's got a ways to go yet...
given that you aim to be bash-compatible it'll make sense to have a
/bin/bash then, but mksh is sufficiently far away (and with no
intention of getting closer) that i've always assumed it would cause
more harm than good. but given that a lot of scripts probably just
have /bin/bash for no real reason, maybe that's not true. i'll ask
around for second/third opinions...
> >> Rob
>
> Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-timeout.test-use-dash-compatible-form-of-trap.patch
Type: text/x-patch
Size: 820 bytes
Desc: not available
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20190916/d5a8334b/attachment-0003.bin>
More information about the Toybox
mailing list