[Toybox] [PATCH] Introducing toys/pending/watchdog.c
Chris Sarra
chrissarra at google.com
Mon Aug 24 10:22:03 PDT 2020
+Chris
On Mon, Aug 24, 2020 at 9:56 AM enh <enh at google.com> wrote:
>
>
> On Sun, Aug 23, 2020 at 11:59 PM Rob Landley <rob at landley.net> wrote:
>
>> On 8/22/20 3:05 PM, enh via Toybox wrote:
>> > On Sat, Aug 22, 2020 at 11:22 AM Chris Sarra via Toybox
>> > <toybox at lists.landley.net <mailto:toybox at lists.landley.net>> wrote:
>> >
>> > This patch introduces a simple watchdog implementation for toybox.
>> We
>> > send the appropriate ioctls to set the relevant timeouts, and
>> intercept
>> > signals to safely shut down if required.
>> > ---
>> > toys/pending/watchdog.c | 128
>> ++++++++++++++++++++++++++++++++++++++++
>> > 1 file changed, 128 insertions(+)
>> > create mode 100644 toys/pending/watchdog.c
>> >
>> > diff --git a/toys/pending/watchdog.c b/toys/pending/watchdog.c
>> > new file mode 100644
>> > index 00000000..0813fe69
>> > --- /dev/null
>> > +++ b/toys/pending/watchdog.c
>> > @@ -0,0 +1,128 @@
>> > +/* watchdog - start a watchdog timer with configurable kick
>> frequencies
>> > +
>> > + Author: Chris Sarra, chrissarra at google.com <mailto:
>> chrissarra at google.com>
>> >
>> > not sure why we're both working on a Saturday, but...
>>
>> Only reason I didn't get to it yesterday is I took a day off like you
>> suggested. :)
>>
>> > + Date: 25 July 2019
>> > + Ref: kernel.org/doc/Documentation/watchdog/watchdog-api.txt
>> > <http://kernel.org/doc/Documentation/watchdog/watchdog-api.txt>
>> >
>> > use the " * " indent from the other source files?
>>
>> I already cleaned this up before spotting this email. :)
>>
>> > also mention that there's precedent in a busybox "watchdog", which this
>> is
>> > command-line compatible with?
>>
>> Not a clue. Last system I dealt with using watchdog stuff had a 5 line
>> function
>> in the main loop of the app poking the watchdog so if the app went down
>> the
>> system rebooted. I've always been mildly confused by standalone watchdog
>> binaries: what do they prove exactly?
>>
>> > +USE_WATCHDOG(NEWTOY(watchdog, "Ft#T#",
>> TOYFLAG_NEEDROOT|TOYFLAG_BIN))
>> >
>> > huh. i didn't realize that USE_ worked _inside_ the comment! probably
>> best to
>> > move it below the comment as is the convention though.
>>
>> It's a grep pulling NEWTOY()/OLDTOY() lines with no leading whitespace
>> out of
>> toys/*/*.c and running them through sort to make generated/newtoys.h.
>>
>> > +config WATCHDOG
>> > + bool "watchdog"
>> > + default y
>> > + help
>> > + usage: watchdog [-F] [-t SW_TIMER_S] [-T HW_TIMER_S] DEV
>> > +
>> > + Start the watchdog timer at DEV with optional timeout
>> parameters.
>> >
>> > (blank line here.)
>> >
>> > + -F run in the foreground (do not daemonize)
>> > + -t software timer (in seconds)
>> > + -T hardware timer (in seconds)
>> >
>> >
>> > say what the defaults are?
>>
>> Ooh, good suggestion.
>>
>> > the busybox help implies that it's possible to use more precision than
>> seconds?
>> > afaik from the kernel source and kernel docs, though, that's a lie?
>>
>> You can for the sleep, not for this watchdog ioctl. Might be a newer
>> ioctl with
>> more precision though... Not spotting an obvious candidate in the current
>> kernel
>> source though. (What's a PRETIMEOUT?)
>>
>
> strace says they're using the same ioctl, so maybe just over-enthusiastic
> copy & paste in the help text...
>
>
>> > i think it would help to explain that -T is what you set the hardware
>> watchdog
>> > to, and -t is how often you kick it. i didn't get that from either your
>> --help
>> > or the busybox --help.
>> >
>> > so something like:
>> >
>> > -T N Set hardware watchdog to N seconds (default 60).
>> > -t N Kick watchdog every N seconds (default 30).
>>
>> Is kick the official term for this?
>>
>
> it's the one i'm familiar with, and wikipedia says it's not just me, but
> `busybox watchdog --help` says "reset" and wikipedia calls the section
> "restart" but then _don't_ use the verb "restart" but do use both "kick"
> (the most) and "reset":
> https://en.wikipedia.org/wiki/Watchdog_timer#Watchdog_restart
>
> (they reserve "restart" for what the device does if the watchdog _isn't_
> kicked, despite naming the section "Watchdog restart". i think that was
> also why it wasn't obvious to me what "reset" meant in the busybox help.
> Maybe "Kick/reset/restart the timer every N seconds (default 30)"?
> actually, if busybox had just included "the timer", any of the three verbs
> would have been fine.)
>
>
> > ? (note that it should be a tab between the "-T N" part and the "Set
>> ..." part.)
>> >
>> >
>> > +*/
>> > +#define FOR_watchdog
>> > +#include "toys.h"
>> > +#include "linux/watchdog.h"
>> > +
>> > +// Make sure no DEBUG variable is set; change this if you need
>> debug prints!
>> > +#undef WATCHDOG_DEBUG
>> >
>> >
>> > delete this or turn it into a -v flag, depending on whether it's still
>> > useful now the toy has been written, or was just useful during
>> development?
>> > (rtcwake, for example, has this kind of thing in -v, but sleep doesn't.)
>>
>> rtcwake -v is in the man page. :)
>>
>> > + }
>> > + }
>> > +
>> > + // Intercept terminating signals so we can call our shutdown
>> routine first.
>> > + if ( intercept_signals(safe_shutdown) ) {
>> > + perror_exit("failed to intercept desired signals: %d", rc);
>> > + }
>> > +#if defined(WATCHDOG_DEBUG)
>> > + printf("Successfully intercepted signals.\n");
>> > +#endif
>> >
>> >
>> > sigatexit()
>>
>> six of one...
>>
>> > + TT.fd = open(watchdog_dev, O_WRONLY);
>> > + if ( TT.fd == -1 ) {
>> > + perror_exit("failed to open '%s'", watchdog_dev);
>> > + }
>> >
>> >
>> > xopen()
>> >
>> >
>> > +#if defined(WDIOC_SETTIMEOUT)
>> >
>> >
>> > as far as i can tell, this has existed since 2.6 kernels, so you don't
>> need the #if?
>>
>> I was wondering if maybe it was something like the way uclibc used to be
>> able to
>> config out support for stuff, but in that case the header would go
>> away...?
>>
>> >
>> > + // SETTIMEOUT takes time value in seconds: s = ms / (1000 ms/s)
>> > + hw_timer_sec = TT.hw_timer_s;
>> > + xioctl(TT.fd, WDIOC_SETTIMEOUT, (void *)&hw_timer_sec);
>> >
>> >
>> > (no need for this local if you let USE_TOY do the default for you. even
>> when you
>> > can't, you can just assign to TT.whatever.)
>> >
>> >
>> > +#endif // WDIOC_SETTIMEOUT
>> > +
>> > + // Now that we've got the watchdog device open, kick it
>> periodically.
>> > + while (1) {
>> > + write(TT.fd, "\0", 1);
>> >
>> >
>> > xwrite?
>>
>> Exiting on failure to write here seems... unhelpful. :)
>>
>
> yeah, good point :-)
>
>
>> You could do writeall() which retries short writes, but you're only going
>> to get
>> a 0 length EAGAIN write from a signal without SA_RESTART (which I believe
>> all
>> SIGDFL does in modern Linux) so the only culprit would be signals we set,
>> which
>> in this case is SIGTERM which doesn't return.
>>
>> That said, SIG_IGN on SIGSTOP might be useful. (Unless somebody WANTS to
>> crash
>> the system by stopping the watchdog?)
>>
>> But the main defense here is having more than one "wake up and poke"
>> period
>> before the hardware reboot triggers. If something did go wrong (which it
>> shouldn't), you'd get another bite at the apple before getting thrown out
>> of the
>> garden.
>>
>> I do note that this doesn't seem to have a SOFTWARE reboot mechanism. But
>> there's no app sending you periodic proof-of-life photos of the kidnapee
>> with
>> today's newspaper, so how would it know when to? (Again, I'm used to the
>> embedded system's main app doing this. All _this_ process poking the
>> watchdog
>> proves is that the process scheduler is still running, not that the
>> system is
>> capable doing anything useful. The logical process to be doing this in
>> android
>> would probably be... I dunno, the zygote spawner? What responds to UI
>> touch
>> screen events, and what launches new apps? I watched several hours of
>> tutorial
>> on this years ago and then you rewrote it all like 8 months later.)
>>
>> Also, I thought you guys quiesce the HELL out of the system to keep
>> battery
>> usage low when the screen's off, and you're going to wake up a process to
>> poke
>> the watchdog? (Baseband checks for incoming cell data, that's not even
>> the same
>> chip? Maybe it's different now...)
>>
>
> (this isn't for Android.)
>
>
Ya, sorry- I should probably add that I work in a different group and we're
using toybox for a non-Android embedded Linux application.
Thanks for your comments and for taking the time to clean this up- I will
follow up later today after I get a chance to test the updated version!
Rob
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20200824/6cd354b1/attachment.htm>
More information about the Toybox
mailing list