[Toybox] [PATCH] Introducing toys/pending/watchdog.c

enh enh at google.com
Mon Aug 24 09:56:40 PDT 2020


On Sun, Aug 23, 2020 at 11:59 PM Rob Landley <rob at landley.net> wrote:

> On 8/22/20 3:05 PM, enh via Toybox wrote:
> > On Sat, Aug 22, 2020 at 11:22 AM Chris Sarra via Toybox
> > <toybox at lists.landley.net <mailto:toybox at lists.landley.net>> wrote:
> >
> >     This patch introduces a simple watchdog implementation for toybox. We
> >     send the appropriate ioctls to set the relevant timeouts, and
> intercept
> >     signals to safely shut down if required.
> >     ---
> >      toys/pending/watchdog.c | 128
> ++++++++++++++++++++++++++++++++++++++++
> >      1 file changed, 128 insertions(+)
> >      create mode 100644 toys/pending/watchdog.c
> >
> >     diff --git a/toys/pending/watchdog.c b/toys/pending/watchdog.c
> >     new file mode 100644
> >     index 00000000..0813fe69
> >     --- /dev/null
> >     +++ b/toys/pending/watchdog.c
> >     @@ -0,0 +1,128 @@
> >     +/* watchdog - start a watchdog timer with configurable kick
> frequencies
> >     +
> >     + Author: Chris Sarra, chrissarra at google.com <mailto:
> chrissarra at google.com>
> >
> > not sure why we're both working on a Saturday, but...
>
> Only reason I didn't get to it yesterday is I took a day off like you
> suggested. :)
>
> >     + Date: 25 July 2019
> >     + Ref: kernel.org/doc/Documentation/watchdog/watchdog-api.txt
> >     <http://kernel.org/doc/Documentation/watchdog/watchdog-api.txt>
> >
> > use the " * " indent from the other source files?
>
> I already cleaned this up before spotting this email. :)
>
> > also mention that there's precedent in a busybox "watchdog", which this
> is
> > command-line compatible with?
>
> Not a clue. Last system I dealt with using watchdog stuff had a 5 line
> function
> in the main loop of the app poking the watchdog so if the app went down the
> system rebooted. I've always been mildly confused by standalone watchdog
> binaries: what do they prove exactly?
>
> >     +USE_WATCHDOG(NEWTOY(watchdog, "Ft#T#",
> TOYFLAG_NEEDROOT|TOYFLAG_BIN))
> >
> > huh. i didn't realize that USE_ worked _inside_ the comment! probably
> best to
> > move it below the comment as is the convention though.
>
> It's a grep pulling NEWTOY()/OLDTOY() lines with no leading whitespace out
> of
> toys/*/*.c and running them through sort to make generated/newtoys.h.
>
> >     +config WATCHDOG
> >     +  bool "watchdog"
> >     +  default y
> >     +  help
> >     +    usage: watchdog [-F] [-t SW_TIMER_S] [-T HW_TIMER_S] DEV
> >     +
> >     +    Start the watchdog timer at DEV with optional timeout
> parameters.
> >
> > (blank line here.)
> >
> >     +    -F run in the foreground (do not daemonize)
> >     +    -t software timer (in seconds)
> >     +    -T hardware timer (in seconds)
> >
> >
> > say what the defaults are?
>
> Ooh, good suggestion.
>
> > the busybox help implies that it's possible to use more precision than
> seconds?
> > afaik from the kernel source and kernel docs, though, that's a lie?
>
> You can for the sleep, not for this watchdog ioctl. Might be a newer ioctl
> with
> more precision though... Not spotting an obvious candidate in the current
> kernel
> source though. (What's a PRETIMEOUT?)
>

strace says they're using the same ioctl, so maybe just over-enthusiastic
copy & paste in the help text...


> > i think it would help to explain that -T is what you set the hardware
> watchdog
> > to, and -t is how often you kick it. i didn't get that from either your
> --help
> > or the busybox --help.
> >
> > so something like:
> >
> > -T N  Set hardware watchdog to N seconds (default 60).
> > -t N   Kick watchdog every N seconds (default 30).
>
> Is kick the official term for this?
>

it's the one i'm familiar with, and wikipedia says it's not just me, but
`busybox watchdog --help` says "reset" and wikipedia calls the section
"restart" but then _don't_ use the verb "restart" but do use both "kick"
(the most) and "reset":
https://en.wikipedia.org/wiki/Watchdog_timer#Watchdog_restart

(they reserve "restart" for what the device does if the watchdog _isn't_
kicked, despite naming the section "Watchdog restart". i think that was
also why it wasn't obvious to me what "reset" meant in the busybox help.
Maybe "Kick/reset/restart the timer every N seconds (default 30)"?
actually, if busybox had just included "the timer", any of the three verbs
would have been fine.)


> > ? (note that it should be a tab between the "-T N" part and the "Set
> ..." part.)
> >
> >
> >     +*/
> >     +#define FOR_watchdog
> >     +#include "toys.h"
> >     +#include "linux/watchdog.h"
> >     +
> >     +// Make sure no DEBUG variable is set; change this if you need
> debug prints!
> >     +#undef WATCHDOG_DEBUG
> >
> >
> > delete this or turn it into a -v flag, depending on whether it's still
> > useful now the toy has been written, or was just useful during
> development?
> > (rtcwake, for example, has this kind of thing in -v, but sleep doesn't.)
>
> rtcwake -v is in the man page. :)
>
> >     +    }
> >     +  }
> >     +
> >     +  // Intercept terminating signals so we can call our shutdown
> routine first.
> >     +  if ( intercept_signals(safe_shutdown) ) {
> >     +    perror_exit("failed to intercept desired signals: %d", rc);
> >     +  }
> >     +#if defined(WATCHDOG_DEBUG)
> >     +    printf("Successfully intercepted signals.\n");
> >     +#endif
> >
> >
> > sigatexit()
>
> six of one...
>
> >     +  TT.fd = open(watchdog_dev, O_WRONLY);
> >     +  if ( TT.fd == -1 ) {
> >     +    perror_exit("failed to open '%s'", watchdog_dev);
> >     +  }
> >
> >
> >  xopen()
> >
> >
> >     +#if defined(WDIOC_SETTIMEOUT)
> >
> >
> > as far as i can tell, this has existed since 2.6 kernels, so you don't
> need the #if?
>
> I was wondering if maybe it was something like the way uclibc used to be
> able to
> config out support for stuff, but in that case the header would go away...?
>
> >
> >     +  // SETTIMEOUT takes time value in seconds: s = ms / (1000 ms/s)
> >     +  hw_timer_sec = TT.hw_timer_s;
> >     +  xioctl(TT.fd, WDIOC_SETTIMEOUT, (void *)&hw_timer_sec);
> >
> >
> > (no need for this local if you let USE_TOY do the default for you. even
> when you
> > can't, you can just assign to TT.whatever.)
> >
> >
> >     +#endif // WDIOC_SETTIMEOUT
> >     +
> >     +  // Now that we've got the watchdog device open, kick it
> periodically.
> >     +  while (1) {
> >     +    write(TT.fd, "\0", 1);
> >
> >
> > xwrite?
>
> Exiting on failure to write here seems... unhelpful. :)
>

yeah, good point :-)


> You could do writeall() which retries short writes, but you're only going
> to get
> a 0 length EAGAIN write from a signal without SA_RESTART (which I believe
> all
> SIGDFL does in modern Linux) so the only culprit would be signals we set,
> which
> in this case is SIGTERM which doesn't return.
>
> That said, SIG_IGN on SIGSTOP might be useful. (Unless somebody WANTS to
> crash
> the system by stopping the watchdog?)
>
> But the main defense here is having more than one "wake up and poke" period
> before the hardware reboot triggers. If something did go wrong (which it
> shouldn't), you'd get another bite at the apple before getting thrown out
> of the
> garden.
>
> I do note that this doesn't seem to have a SOFTWARE reboot mechanism. But
> there's no app sending you periodic proof-of-life photos of the kidnapee
> with
> today's newspaper, so how would it know when to? (Again, I'm used to the
> embedded system's main app doing this. All _this_ process poking the
> watchdog
> proves is that the process scheduler is still running, not that the system
> is
> capable doing anything useful. The logical process to be doing this in
> android
> would probably be... I dunno, the zygote spawner? What responds to UI touch
> screen events, and what launches new apps? I watched several hours of
> tutorial
> on this years ago and then you rewrote it all like 8 months later.)
>
> Also, I thought you guys quiesce the HELL out of the system to keep battery
> usage low when the screen's off, and you're going to wake up a process to
> poke
> the watchdog? (Baseband checks for incoming cell data, that's not even the
> same
> chip? Maybe it's different now...)
>

(this isn't for Android.)


> Rob
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20200824/7d7280dc/attachment-0001.htm>


More information about the Toybox mailing list