[Toybox] About commit 893a092f62a6 to mdev.c

Rob Landley rob at landley.net
Wed Apr 8 15:39:22 PDT 2015

In japan using gmail's web interface, so if this is unreadable (or
worse, html email), I apologize in advance.

So on the 12 hour flight from LAX to tokyo I finally got to look at
last month's mdev commit, which I'd applied without really looking at
it because that's what "pending" is for, and now I'm kind of
rethinking the whole "pending" directory because this commit was
copied from busybox.

Don't DO that.

Ok, back up. Let's get a little context.

Looking through the new mdev_main(), I had some questions. Why is
there a buf[PATH_MAX] when we have toybuf? Why did this commit start
calling uname(0) when TOYFLAG_UNAME already did that for us? Why was
xchdir("/dev") added when the code was previously creating nodes via
absolute paths? (I had a reason for this: the current working
directory is one of the pieces of state that's hard to restore if you
ever do a nofork() version, and I like to keep my options open. So
that's a large shift from the previous design...)

The stat with the globals was a little awkward but clearly it was
copying the old /dev/root symlink thing I did for busybox (only way to
look up the device name to symlink _to_ is to traverse /sys/block, and
since we're doing that anyway... Although you could just save the
st_dev value and only need one global for this.)

And then I got to putenv((char*)"ACTION=add"); and went "this is not
right". That's subtly wrong in multiple different ways: Why the
gratuitous typecast of something that's _already_ a char *? Why is the
coding style wrong? (No space between char and *, which is bad because
newbie C programmers who don't understand how variable declaration
works go "char* a, b, c;" and think b and c are pointers when they're
not, so doing that is a sign of somebody who is not mentally
comfortable with the way C handles pointers.) And why are we setting
an environment variable that we're not immediately using with no
comment about why we're doing it? (Will the dirtree callback be
reading this? Where's the consumer? It's not obvious from context, but
not explained either.)

Then right after that, we have dirtree_add_node() followed by
dirtree_handle_callback() as two separate steps. Not only do we have a
function that does both steps in one (dirtree_read()) but the previous
code was already using that. This commit split up dirtree_read() into
multiple steps for no obvious reason. No comment explaining why to do
it this way...

So I look back at the gratuitous typecast and go "that looks copied
from somewhere", and look down and see a variable named "fware" and go
"that's an awkward name, you either disemvowel a name or you spell it
out, this is not a first or even second choice of variable name", and
then I page up a couple times and see an #if 0 block around
sequence_file() and go "why the HELL would there be an #if 0 block
around a function that just got written, either they reached a good
stopping point or they didn't, why is this there but commented out...
unless it's copied from somewhere."

Here's the old toybox mdev_main() in full:

> void mdev_main(void)
> {
>   // Handle -s
>   if (toys.optflags) {
>     dirtree_read("/sys/class", callback);
>     dirtree_read("/sys/block", callback);
>   }
>   // hotplug support goes here
> }

Here's the start of the new main() I just described:

> void mdev_main(void)
> {
>   char buf[PATH_MAX];
>   struct dirtree *root;
>   umask(0);
>   xchdir("/dev");
>   if (toys.optflags & FLAG_s) {
>     struct stat st;
>     xstat("/", &st);
>     TT.root_maj = major(st.st_dev);
>     TT.root_min = minor(st.st_dev);
>     putenv((char*)"ACTION=add");
>     root = dirtree_add_node(0, "/sys/class", 1);
>     if (root) dirtree_handle_callback(root, callback);
>     root = dirtree_add_node(0, "/sys/block", 1);
>     if (root) dirtree_handle_callback(root, callback);
>   } else {  // Hotplug handling

Here's the start of BUSYBOX's mdev_main():

> int mdev_main(int argc, char **argv) MAIN_EXTERNALLY_VISIBLE;
> int mdev_main(int argc UNUSED_PARAM, char **argv)
> {
>         INIT_G();
>         G.filename = "/etc/mdev.conf";
> #endif
>         /* We can be called as hotplug helper */
>         /* Kernel cannot provide suitable stdio fds for us, do it ourself */
>         bb_sanitize_stdio();
>         /* Force the configuration file settings exactly */
>         umask(0);
>         xchdir("/dev");

Oh hey look, the gratuitous umask(0) and xchdir("/dev") that had no
place in the new main():

>         if (argv[1] && strcmp(argv[1], "-s") == 0) {
>                 /*
>                  * Scan: mdev -s
>                  */
>                 struct stat st;
>                 /* Same as xrealloc_vector(NULL, 4, 0): */
>                 G.rule_vec = xzalloc((1 << 4) * sizeof(*G.rule_vec));
> #endif
>                 xstat("/", &st);
>                 G.root_major = major(st.st_dev);
>                 G.root_minor = minor(st.st_dev);
>                 putenv((char*)"ACTION=add");

Oh look, four more lines copied nearly verbatim, including the strange
(char*) typecast canary.

Let's look at that #if 0 function:

> #if 0
> static int sequence_file(char *seq)
> {
>   static struct timespec tspec = { 0, 32*1000*1000 }; // time out after 2 secs
>   int time_out = 2000 / 32, fd = -1;
>   sigset_t sg;
>   sigemptyset(&sg);
>   sigaddset(&sg, SIGCHLD);
>   sigprocmask(SIG_BLOCK, &sg, NULL);
>   for (;;) {
>     ssize_t slen;
>     char buf[sizeof(int)*3 + 2];
>     if (fd < 0 && (fd = open("/dev/mdev.seq", O_RDWR)) < 0) break;

And in busybox:

> static int
> wait_for_seqfile(const char *seq)
> {
>         /* We time out after 2 sec */
>         static const struct timespec ts = { 0, 32*1000*1000 };
>         int timeout = 2000 / 32;
>         int seq_fd = -1;
>         int do_once = 1;
>         sigset_t set_CHLD;
>         sigemptyset(&set_CHLD);
>         sigaddset(&set_CHLD, SIGCHLD);
>         sigprocmask(SIG_BLOCK, &set_CHLD, NULL);
>         for (;;) {
>                 int seqlen;
>                 char seqbuf[sizeof(int)*3 + 2];
>                 if (seq_fd < 0) {
>                         seq_fd = open("mdev.seq", O_RDWR);
>                         if (seq_fd < 0)
>                                 break;

Same 32*1000*1000, same sizeof(int)*3 + 2, same 2000 / 32 with the
same _spacing_. The first three variables are timespec, timeout, and
fd declared in the same order.

I feel like a professor who caught a student plagiarizing wikipedia,
who tried to paraphrase each sentence but otherwise had the same
sentences in the same order.

And no, the moral here isn't "hide it better". Opening two windows and
typing the same code into a second window, changing the variable names
removing comments and reordering a couple lines, does not make an
original work of authorship. This is another reason for thorough
review before promoting anything out of "pending": not allowing
obviously copied code from another project with a different license to
wind up in defconfig. (It may take me a while, but I  do try to check
this stuff, and I'm _annoyed_ when I find it.)

If you wonder why I sometimes give in to my urge to rewrite commands
from scratch, I prefer to err on the side of not missing this stuff if
it came from a codebase I'm not as historically familiar with as
busybox. If you plagiarize netbsd its license still has the "copy this
license text verbatim into your derived works" and yes I care, not
_just_ because berkeley won its countersuit against AT&T
(http://cm.bell-labs.com/who/dmr/bsdi/bsdisuit.html) but because it's
the _right_thing_.


More information about the Toybox mailing list