[Toybox] toybox - init?

Fri Dec 12 09:18:58 PST 2014

On 12/11/2014 11:16 AM, Rich Felker wrote:
> On Tue, Dec 09, 2014 at 06:21:04AM -0600, Rob Landley wrote:
>> On 12/09/2014 03:38 AM, scsijon wrote:
>>> was just wondering on the status of init within toybox as I'm having a
>>> look at buildroot and maybe it would be useful there.
>>>
>>> thanks
>>> scsijon
>>
>> oneit is in and has worked for years. I'm pondering extending oneit with
>> a pipe that writes exiting child process pids to a non-PID1 process that
>> can deal with restarting them if it cares.
> 
> For what it's worth, I like that concept. Do you have a way in mind
> for the non-PID1 process that's receiving these pids to match them up
> with services that need restarting?

I haven't done a lot of design work here (I want to research what
upstart and macosx launchd and so on are currently doing, and see what
lxc's container launch program is doing, and yes look at systemd to
understand the full horror; it's a can of worms I haven't opened yet
because it'll eat my life for a bit, just like sed's been doing).

But: the non-pid1 process should probably be the one that starts the
other services in the first place.

Basically, oneit's design is "launch a single child process, wait for it
to exit, do a thing in response". That thing is currenty shutdown or
reboot, but adding "relaunch the child process" is trivial.

If oneit gets other things exiting at it, that's because they got
reparented to init (I.E. their parent exited, which daemonize() does
automatically). Right now it ignores them, but writing them to a pipe is
also trivial.

We should _not_ add anything to oneit that _isn't_ trivial. Policy
belongs in pid 2.

> I don't see a good way to do this
> in general but it seems safe to use pid files (produced by the daemons
> themselves) here as long as the only purpose you use them for is
> detecting exit.

If oneit launches another init-like program as PID 2, and that pid2
parses its variant of inittab to launch a bunch of processes, then it
can make a directory of pid files that contain the pid of the service,
and can check each one when it gets an exit to see if that PID is still
running. (It doesn't even have to care what the pid it gets is, just
"wake up and check your services are all still running, relaunch as
appropriate".)

As an optimization it could check the received PID against a linked list
kept in memory, but if pid 2 exits and gets restarted (we upgraded the
package, relaunch yourself), the on-disk pid files _must_ be up to date
so it can recover the system state without rebooting.

Later we can make the children be containers: what it's tracking is the
pid of the container init task (which can be another instance of oneit
configured to just exit if its child dies), and then the pid2 process
checks if the container is still ok and relaunches it if not. What's
_in_ the container can be arbitrarily complex, because it's not really
our problem. (Run your own init script in there, see if I care.)

All the "what should we run" decisions and sequencing complexity of
"this service needs to launch before this service" belongs in this pid2
init service, _not_ in oneit. All the policy should be in pid2, policy
should not run in the special PID1 context, it should just be monitored
by PID 1. (Someday we could add some kind of heartbeat function so oneit
checks if PID2 is frozen, and kills/relaunches it if so. But that's
future expansion.)

We can even implement this in stages: start with the pipe of reparented
PIDs that exited, and write a pid2 that reads from the pipe and just
writes each exiting pid to stderr or something.

Eventually we can work our way up to "the container init is higher
priority than the child it launches, and thus if the child exits it can
see if it still has a child process left and take the lowest numbered
one as its new child to be monitoring, to allow children running
daemonize() to be tracked". But that sort of thing is _not_ going to be
in the first pass, and we'd wait for people to complain at us with real
world examples to use as test cases. And pid 2 would still just care
whether the container init had exited, and the brains for figuring out
where to exit would be in the container init task so it's _separate_ and
not one big hairball.

And "oneit runs with a third party pid 2" is built in to the design.
They are NOT tied closely together. The pid pipe and even the eventual
heartbeat logic should be as generic as possible.

(Possibly in my cleanup pass I can give toybox's svinit a "run as pid 2,
listening to pid exit pipe" mode. It would have to know when _it_ got
restarted, so it didn't run the full system init scripts again...
actually it would need the same "directory of pid files to track live
children that need relaunch when they go away" logic...)

Anyway, those are the lines I'm thinking along. I'm sure there will be
more corner cases when actually trying to make it work, there always are...

Rob