[Toybox] [PATCH] Factor out dhcpd's and syslogd's main loops into a library function
Rob Landley
rob at landley.net
Fri Sep 6 04:19:47 PDT 2013
On 09/01/2013 08:54:27 AM, Felix Janda wrote:
> Both syslogd and dhcpd wait for sockets, signals or a timeout.
> Currently,
> they use select() with its timeout to listen to the sockets and a
> self-pipe for the signals.
>
> The new library function instead uses pselect() without a timeout and
> alarm(). With pselect() signals are only unblocked during this system
> call; so the xwrappers don't need to be aware about interrupted
> syscalls.
Are we normally blocking a lot of signals in other commands? Using the
6 argument version instead of the 5 argument version addresses an issue
for which existing command?
> alarm() is used instead of the timeout argument to pselect() so that
> the length of the intervals between timeouts is not affected by being
> interrupted.
QEMU recently changed its timer system to _not_ use alarm:
http://blog.alex.org.uk/2013/08/24/changes-to-qemus-timer-system/
I'm not convinced sending signals to ourselves is preferable to telling
the system call when to timeout. Of the three options (select adjusting
its timespec, us manually reading the clock and feeding pselect a
timeout based on when we next want to wake up, and us sending ourselves
an alarm), us sending ourselves an alarm and constantly having to reset
it seems like it has the most moving parts and the most things that can
go wrong.
Ok, maybe I've been slightly burned by SIGALRM before:
http://landley.net/notes-2011.html#05-09-2011
> On SIGHUP syslogd restarts. Previously everything relevant was
> contained
> in syslogd_main() and the restarting was implemented by a goto. Now a
> longjmp() would be necessary. I instead used an xexec().
Since commit 955 (and pondering toysh to finally tackle that sucker
within a finite amount of time) I'm a bit worried that restarting a
command without OS involvement might misbehave if the signal mask gets
corrupted, or there are funky environment variables, or memory leaks
build up, or we have leftover filehandles, or a couple dozen other
things.
We haven't got "generic reset this process's state" code, longjmp()
leaves a lot of debris, and xexec() callign a new main doesn't clear
the old stuff off the stack; do that in a tight enough loop for long
enough and eventually you will run out of stack even in a normal linux
userspace. (Default's 8 megs: ulimit -s says 8192 and the bash man page
says the units are kilobytes.)
I added NOFORK because some commands (like cd) _must_ happen in the
current toysh process context, but also because I could very very
carefully audit commands to make sure it was safe to call them in the
same process context and continue when they return. But these commands
are in pending because I haven't even done an initial audit pass on
them.
Discarding your process on restart and execing a new one covers a
multitude of sins. Possibly syslogd is clean enough it can restart
itself without accumulating trash, but as a long-lived daemon I have
yet to closely read (hence it being in the pending directory), this
does not fill me with confidence.
Probably just the "it's 5am, sun coming up soon" things. I should step
away from the keyboard...
> Since xexec()
> doesn't change the pid the pidfile needs to be unlinked() before
> terminating so that xpidfile() doesn't complain.
Um... I agree this is a symptom of a problem. I'm not sure that's the
right fix. The restart is maintaining state and then getting confused
by the state it's maintaining...
(I can just see some horrible systemd thing hitting a race window where
the PID file wasn't there and trying to restart the process...)
> Furthermore if not run
> in the foreground syslogd will daemonize() again when restarting.
> Could
> one detect in daemonize() whether daemonizing is necessary?
"The restart is maintaining state and then getting..."
Are these the _only_ instances of this? Is a main loop that exits and
restarts itself but avoids initial setup better?
Is there a clean way to do this where we _don't_ have to be absolutely
sure we've thought of everything ever? execv("/proc/self/exe",
toys.argc) and then fall back to something else if it returns, perhaps?
Or can we get an exhaustive list of all process resources (from the
container suspend/resume guys, presumably) and check for ourselves that
we're not leaking anything?
Very old concern of mine, a full decade before containers:
http://lkml.indiana.edu/hypermail/linux/kernel/0206.2/0835.html
> Felix
>
> # HG changeset patch
> # User Felix Janda <felix.janda at posteo.de>
> # Date 1378041908 -7200
> # Node ID 752d22ece0fe3fa9763fd0b6e231e653cc173890
> # Parent 587b7572aac2a64988503458155e1f2546672525
> Factor out daemon select() loops into a library function
Ok, the new select() lib function I'd probably be up for reviewing on
its own, and the cleanups to a function I haven't even read yet I'd
just pass through and read/cleanup the new baseline when I got around
to it.
But both at once? Not up for that right now. Might try again in the
morning...
Rob
More information about the Toybox
mailing list