[Toybox] [PATCH] A implemetation of the 'csplit' command

Rob Landley rob at landley.net
Tue Sep 12 12:38:50 PDT 2023


On 9/11/23 23:56, Oliver Webb via Toybox wrote:
> I have made a implementation of the 'csplit' command in about 160 lines of code.

You have TOYFLAG_MAYFORK on this command. Sigh, explaining the lib/toyflags.h
values is one of the tutorial videos I need to make.

Forking is the default behavior for launching new commands in toybox.
TOYFLAG_NOFORK and TOYFLAG_MAYFORK are for the toybox shell (sh.c). The first
indicates a shell builtin that can only run within the shell's process (like
"cd", since forking a child process, calling chdir() in the child, and having
the child exit doesn't actually change the parent's getcwd() value). NOFORK
commands don't show up in the command list output by running "toybox", but they
do show up in the command list you get by running "help" with no arguments in
the shell.

The second (MAYFORK) indicates a command that _can_ run standalone, and thus
shows up in the "toybox" list so the installer creates a symlink for it in the
search $PATH, but when it runs from toysh it acts like NOFORK and is a function
call made by the current process (and eventually returns back to the shell so
the shell's PID can go on and do more shell things afterwards). This allows the
command to access the shell's data structures, and thus perform additional
functions such as setting environment variables in the shell (printf %n), or
accessing the job control list (kill %1).

Since both NOFORK and MAYFORK commands can be run from within the shell, they
have to scrupulously clean up after themselves. When they call xexit() and
friends (which includes things like perror_exit() and stuff like xmalloc() that
can call it) they longjmp() back to toysh instead of exiting, which means
resources like filehandles and heap allocations and any mmap() it does may have
to live in the GLOBALS() block, and it may need a sigatexit() handler to free
that stuff out of GLOBALS so long-running shells (or shell scripts) don't
accumulate leaked debris from builtins that exited abnormally.

(Note: lib/lib.c has sigatexit() instead of libc's atexit() because WHEN we
longjmp() back to the shell, we need to first call our own atexit() handlers and
then remove them from the list. The libc ones don't let you call them and remove
them from the list libc maintains without exiting. Auditing everything for
leaks, including all the NOFORK and MAYFORK commands, is a big todo item in the
shell work I need to dive into at some point...)

I dunno why csplit would want MAYFORK here. A normal command can just xexit()
and let the kernel close filehandles and free memory when the process exits. I
note that 95% of the overhead of fork/exec is the exec part, not the fork part,
so "fork and call toy_find("blah")->toy_main()" is still pretty cheap. (On
systems with an MMU, anyway. It's all copy on write. I'm aware Rich Felker
disagrees, but he's always using threads for everything, and threads have
_always_ combined badly with fork(). I suspect he's setting up some gratuitous
thread plumbing by default that he thought was free, and suddenly he noticed
he's penalized fork(), and now he's blaming fork(). But I haven't looked deeply
into the details of what he's mad about, because I dowanna. But, you know, the
linux-kernel guys would have NOTICED if fork() was slow. As would everybody else
everywhere.)

> The implementation is mostly POSIX compliment, but it is missing a few things 

Missing stuff out of posix is pretty normal, they specify a lot of nonsense. My
patch implementation is missing various the posix options like -b and -e, and
not only has nobody complained, but I submitted my patch implementation to
busybox in 2010 and _they_ haven't bothered to implement those options since either.

> It works as a Read-Eval-Print loop, where it prints to a file that changes based on context. So doing negative offsets would require it to print lines it doesn't accumulate yet.

Yeah, grep -A -B -C does that sort of ring buffer nonsense with lines it _may_
need depending on later stuff. It's a fiddly pain.

> The other main one is the fact it doesn't do "[LINE] {[NUMBER]}" cleanly yet.

I applied what you sent verbatim and haven't started cleaning anything up yet,
if you have more work to do I'm not actually familiar with csplit. (Never used
it, still need to come up to speed...)

> It also includes the GNU extension "{*}" argument
> 
> The other breaks from POSIX are mostly insignificant, like the fact it doesn't
> check locale environment variables or uses "%lu" for file size instead of "%d". 

Nothing in toybox checks the locale environment variables (outside of UTF-8
enablement for the fontmetrics stuff in main.c, and we usually _set_ the
variables when we do that).

And posix has been just plain wrong about int-vs-long printf variables since the
general switch to 64 bits in 2005. It's coming up on 20 years since then, so
possibly Issue 8 will finally fix that? Or maybe that's just when they finally
noticed they're obsolete and the NEXT release would fix it? Wake me when they
restore "tar" and deprecate "pax"...

Rob


More information about the Toybox mailing list