[Toybox] FreeBSD Toybox check-in

Rob Landley rob at landley.net
Fri Nov 10 09:13:52 PST 2023


On 11/9/23 17:03, Ed Maste wrote:
>> Hmmm, most of those are enabled for macos. We should figure out why those builds
>> break and maybe fix them?
>>
>> cp/mv/install are the same plumbing with different UI.
> 
> Ah, these fail due to:
> toys/posix/cp.c:128:18: error: call to undeclared function
> 'xattr_flist'; ISO C99 and later do not support implicit function
> declarations [-Werror,-Wimplicit-function-declaration]
> (and xattr_fget, xattr_fset). On FreeBSD these are extattr(2) and
> should need only a small patch in portability.c.

Which means that lib/portability.c hasn't got an implementation for BSD:

#ifdef __APPLE__
...
ssize_t xattr_flist(int fd, char *list, size_t size)
{
  return flistxattr(fd, list, size, 0);
}
...
#elif !defined(__FreeBSD__) && !defined(__OpenBSD__)
...
ssize_t xattr_flist(int fd, char *list, size_t size)
{
  return flistxattr(fd, list, size);
}
...
#endif

Shouldn't be hard to add?

>> No idea why tar is unhappy? (Possibly the sparse or selinux stuff?)
> 
> Similar issue, xattr_lget and xattr_get. And error: use of undeclared
> identifier 'ENODATA'

The first two are same as above: the security blanket shims aren't there for
this arch in the #ifdef staircase.

The third is a constant out of the standard linux headers:

/usr/include/asm-generic/errno.h:#define	ENODATA		61	/* No data available */

      // First time get length, second time read data into prepared buffer
      len = (S_ISLNK(st->st_mode) ? xattr_lget : xattr_get)
        (name, sec, buf+start, sz);

      if (errno==ENODATA || errno==ENOTSUP) len = 0;

It's trying to gracefully fail if we fetched xattr info on a system that hasn't
got xattr support. You can probably #define it to ENOTSUP. Linux uses it quite a
bit:

$ grep -r ENODATA linux/* | wc -l
1391

But this is the only occurrence in toybox so far.

>> Ah, uname.c also has arch and linux32 shoved in there, and linux32() is calling
>> personality(PER_LINUX32) which you probably don't have (either the function or
>> the macro name). I could stick an #ifdef clause in portability.h?
> 
> Yep, we have no personality(2).

Understandable. The "arch" and "linux32" commands were implemented as a pair,
and arch was glued to uname because it's basically an alias for "uname -m". No
real reason I couldn't move linux32 out to its own file...

Ah, linux32 does _not_ run a command in 32 bit mode. It tells uname -m to lie
about the architecture type. That's literally all it does. The man page says:

       PER_LINUX32 (since Linux 2.2)
              [To be documented.]

But outside of the arch/ directory of the kernel source the only user is
kernel/sys.c which swaps out the uname -m name. The symbol doesn't occur in
arch/x86 and in arch/arm64 there's two users: kernel/cpuinfo.c has c_show print
different info in /proc/cpuinfo and kernel/sys.c has the arm64_personality()
syscall sometimes return -EINVAL when it's set.

Nothing about refusing to launch a 64 bit binary in the ELF loader or anything
like that. It's cosmetic, and basically another lie-to-autoconf so a 32 bit
chroot won't try to force itself to be 64 bit because of gnu brain damage.

>> Also, if I _do_ stick a #define personality(x) ; in portability.h for freebsd,
>> then linux32 would build but be a NOP. This is not unique and is why "it builds
>> for freebsd" and "it does anything interesting" are two different things to test. :)
> 
> Yes, definitely. Right now I'm building toybox and experimenting for
> its own sake, more testing is definitely called for before actually
> using it.

The hard one is ps and friends. Last I checked there was a magic shared library
to query proc info, and when I dug under the covers it was several different
syscall/fcntl/ioctl variants actually querying the data.

In _theory_ all the reading of /proc for ps/top/pgrep is done in one place: the
dirtree callback get_ps() populates a "struct procpid *tb" instance (in toybuf),
and then either calls TT.show_process() or does a memdup() on the data and saves
it in the "extra" field of the dirtree node for later processing (sorted output
and such). There's also a get_threads() wrapper that also fetches the threads
under the processes, but it's basically just a shim layer that recurses down
into /proc/$PID/task and calls back into get_ps() to read each thread.

The struct it's populating is defined on line 285:

// Data layout in toybuf
struct procpid {
  long long slot[SLOT_count]; // data (see enum above)
  unsigned short offset[6];   // offset of fields in str[] (skip CMD, always 0)
  char state;
  char str[];                 // CMD, TTY, WCHAN, LABEL, COMM, ARGS, NAME
};

There's not much there: slot[] holds numbers, str[offset[]] holds strings, and
state is the single character state field (a process "stuck in D state", etc).

The slot[] array has all those fields in the big enum from line 224, and then
the fields get mapped to display name strings using the "struct typography
typos[]" array starting on line 229.

The slot[] array started out holding the /proc/self/stat fields more or less in
order (skipping the second and third fields which aren't numbers), but then it
got extended and patched a lot as more numeric data needed to be fetched from
various places.

So the core of it's still:

  for (j = SLOT_ppid; j<SLOT_upticks; j++)
    if (1>sscanf(s += i, " %lld%n", slot+j, &i)) break;

With lines 733 through 759 handling the first three fields in the string we read
from /proc/$PID/stat (and writing to the first entry in slot[], the two string
ones get handled separately).

But then starting at line 767 there are about a hundred lines of reading stuff
from OTHER places and sticking it in various slot[] fields.

Then right after the struct procpid there's variable length string data
(accessed through str[]), kinda like argv[] where they're consecutive null
terminated strings but we have an array of offsets to the start of each one. The
typos[] entries with a negative number where the SLOT_enum would otherwise go
display string data instead of a number, and that negative index (starting from
-1) ala str[offset[1-typos[blah].slot]]; Except there are actually _seven_
strings, but the first one starts at offset 0 so we don't bother to record that...

The consumers of struct procpid shouldn't care where the data comes from, it's
already been fetched (all in one place, intentionally). Porting this to BSD
would mean coming up with a replacement get_ps() and get_threads() functions,
and I _think_ the rest of it should just work?

Alas, those functions aren't trivial...

>> Still dunno what to do about the "/usr/bin/env is cannon but /bin/bash is not"
>> weirdness, especially given that a lot of Linux systems started symlinking
>> /usr/bin to /bin some years ago... (Yeah, partially my fault, but still...)
> 
> Yes. In the fullness of time I expect we'll make the same (symlink)
> change in FreeBSD, but it still won't help here, because on FreeBSD
> bash is /usr/local/bin/bash.

FreeBSD _ships_ local. As part of the distro, they deploy files into local.

Define "local".

Rob


More information about the Toybox mailing list