[Toybox] [PATCH] Introducing toys/pending/watchdog.c

Mon Aug 24 00:08:09 PDT 2020

On 8/22/20 3:05 PM, enh via Toybox wrote:
> On Sat, Aug 22, 2020 at 11:22 AM Chris Sarra via Toybox
> <toybox at lists.landley.net <mailto:toybox at lists.landley.net>> wrote:
> 
>     This patch introduces a simple watchdog implementation for toybox. We
>     send the appropriate ioctls to set the relevant timeouts, and intercept
>     signals to safely shut down if required.
>     ---
>      toys/pending/watchdog.c | 128 ++++++++++++++++++++++++++++++++++++++++
>      1 file changed, 128 insertions(+)
>      create mode 100644 toys/pending/watchdog.c
> 
>     diff --git a/toys/pending/watchdog.c b/toys/pending/watchdog.c
>     new file mode 100644
>     index 00000000..0813fe69
>     --- /dev/null
>     +++ b/toys/pending/watchdog.c
>     @@ -0,0 +1,128 @@
>     +/* watchdog - start a watchdog timer with configurable kick frequencies
>     +
>     + Author: Chris Sarra, chrissarra at google.com <mailto:chrissarra at google.com>
> 
> not sure why we're both working on a Saturday, but...

Only reason I didn't get to it yesterday is I took a day off like you suggested. :)

>     + Date: 25 July 2019
>     + Ref: kernel.org/doc/Documentation/watchdog/watchdog-api.txt
>     <http://kernel.org/doc/Documentation/watchdog/watchdog-api.txt>
> 
> use the " * " indent from the other source files?

I already cleaned this up before spotting this email. :)

> also mention that there's precedent in a busybox "watchdog", which this is
> command-line compatible with?

Not a clue. Last system I dealt with using watchdog stuff had a 5 line function
in the main loop of the app poking the watchdog so if the app went down the
system rebooted. I've always been mildly confused by standalone watchdog
binaries: what do they prove exactly?

>     +USE_WATCHDOG(NEWTOY(watchdog, "Ft#T#", TOYFLAG_NEEDROOT|TOYFLAG_BIN))
> 
> huh. i didn't realize that USE_ worked _inside_ the comment! probably best to
> move it below the comment as is the convention though. 

It's a grep pulling NEWTOY()/OLDTOY() lines with no leading whitespace out of
toys/*/*.c and running them through sort to make generated/newtoys.h.

>     +config WATCHDOG
>     +  bool "watchdog"
>     +  default y
>     +  help
>     +    usage: watchdog [-F] [-t SW_TIMER_S] [-T HW_TIMER_S] DEV
>     +
>     +    Start the watchdog timer at DEV with optional timeout parameters.
> 
> (blank line here.)
>  
>     +    -F run in the foreground (do not daemonize)
>     +    -t software timer (in seconds)
>     +    -T hardware timer (in seconds)
> 
> 
> say what the defaults are?

Ooh, good suggestion.

> the busybox help implies that it's possible to use more precision than seconds?
> afaik from the kernel source and kernel docs, though, that's a lie?

You can for the sleep, not for this watchdog ioctl. Might be a newer ioctl with
more precision though... Not spotting an obvious candidate in the current kernel
source though. (What's a PRETIMEOUT?)

> i think it would help to explain that -T is what you set the hardware watchdog
> to, and -t is how often you kick it. i didn't get that from either your --help
> or the busybox --help.
> 
> so something like:
> 
> -T N  Set hardware watchdog to N seconds (default 60).
> -t N   Kick watchdog every N seconds (default 30).

Is kick the official term for this?

> ? (note that it should be a tab between the "-T N" part and the "Set ..." part.)
>  
> 
>     +*/
>     +#define FOR_watchdog
>     +#include "toys.h"
>     +#include "linux/watchdog.h"
>     +
>     +// Make sure no DEBUG variable is set; change this if you need debug prints!
>     +#undef WATCHDOG_DEBUG
> 
> 
> delete this or turn it into a -v flag, depending on whether it's still
> useful now the toy has been written, or was just useful during development?
> (rtcwake, for example, has this kind of thing in -v, but sleep doesn't.)

rtcwake -v is in the man page. :)

>     +    }
>     +  }
>     +
>     +  // Intercept terminating signals so we can call our shutdown routine first.
>     +  if ( intercept_signals(safe_shutdown) ) {
>     +    perror_exit("failed to intercept desired signals: %d", rc);
>     +  }
>     +#if defined(WATCHDOG_DEBUG)
>     +    printf("Successfully intercepted signals.\n");
>     +#endif
> 
> 
> sigatexit()

six of one...

>     +  TT.fd = open(watchdog_dev, O_WRONLY);
>     +  if ( TT.fd == -1 ) {
>     +    perror_exit("failed to open '%s'", watchdog_dev);
>     +  }
> 
> 
>  xopen()
>  
> 
>     +#if defined(WDIOC_SETTIMEOUT)
> 
> 
> as far as i can tell, this has existed since 2.6 kernels, so you don't need the #if?

I was wondering if maybe it was something like the way uclibc used to be able to
config out support for stuff, but in that case the header would go away...?

> 
>     +  // SETTIMEOUT takes time value in seconds: s = ms / (1000 ms/s)
>     +  hw_timer_sec = TT.hw_timer_s;
>     +  xioctl(TT.fd, WDIOC_SETTIMEOUT, (void *)&hw_timer_sec);
> 
> 
> (no need for this local if you let USE_TOY do the default for you. even when you
> can't, you can just assign to TT.whatever.)
>  
> 
>     +#endif // WDIOC_SETTIMEOUT
>     +
>     +  // Now that we've got the watchdog device open, kick it periodically.
>     +  while (1) {
>     +    write(TT.fd, "\0", 1);
> 
> 
> xwrite?

Exiting on failure to write here seems... unhelpful. :)

You could do writeall() which retries short writes, but you're only going to get
a 0 length EAGAIN write from a signal without SA_RESTART (which I believe all
SIGDFL does in modern Linux) so the only culprit would be signals we set, which
in this case is SIGTERM which doesn't return.

That said, SIG_IGN on SIGSTOP might be useful. (Unless somebody WANTS to crash
the system by stopping the watchdog?)

But the main defense here is having more than one "wake up and poke" period
before the hardware reboot triggers. If something did go wrong (which it
shouldn't), you'd get another bite at the apple before getting thrown out of the
garden.

I do note that this doesn't seem to have a SOFTWARE reboot mechanism. But
there's no app sending you periodic proof-of-life photos of the kidnapee with
today's newspaper, so how would it know when to? (Again, I'm used to the
embedded system's main app doing this. All _this_ process poking the watchdog
proves is that the process scheduler is still running, not that the system is
capable doing anything useful. The logical process to be doing this in android
would probably be... I dunno, the zygote spawner? What responds to UI touch
screen events, and what launches new apps? I watched several hours of tutorial
on this years ago and then you rewrote it all like 8 months later.)

Also, I thought you guys quiesce the HELL out of the system to keep battery
usage low when the screen's off, and you're going to wake up a process to poke
the watchdog? (Baseband checks for incoming cell data, that's not even the same
chip? Maybe it's different now...)

Rob