[Toybox] Poke about the bc.c cleanup patches I submitted a while ago

Rob Landley rob at landley.net
Wed Mar 27 06:31:26 PDT 2024


Yesterday I did NOT spend all my energy reading email, and instead got
https://landley.net/bin/toolchains updated with a musl 1.2.5 and or1k and riscv
in the list, and that seems to have fixed the sh2eb build break as well
(although I haven't tried booting it on a Turtle board yet, haven't unpacked any
here in Minneapolis...) and rebuilt all the mkroot targets against the 6.8
kernel (the tmpfs patch went upstream-ish but the rest all still apply, none of
those issues will ever voluntarily be fixed by the kernel clique), and the tests
told me I need kernel/qemu configs for armv4l armv7m microblaze mips64 riscv32
riscv64 sh4eb, which reminded me of my "make the fdpic loader work on sh4 with
mmu work" which should become another patch and get finished now that I've got
updated toolchains with the sh4 longjmp bug fixed...

But today I'm being good and back to spending my energy responding to email instead.

On 3/24/24 21:45, Oliver Webb wrote:
> On Sunday, March 24th, 2024 at 18:27, Rob Landley <rob at landley.net> wrote:
> 
>> > I've been looking to do a cleanup pass on bc because there are a lot of very obvious things
>> > that can be removed (typedefed structs as far as the eye can see, all the "posixError" garbage,
>>
>> Agreed. I still haven't decided whether to throw it out and start over, but you
>> can't make it worse. (Your cleanup patch broke xzcat, but I can't tell if this
>> one is right or wrong outside of its test suite already, and only really care
>> about the kernel timeconst.bc use case anyway, so...)
> 
> Permission to remove the annoying signal handling that only really matters (gets in the way of exiting) 
> on interactive sessions?

"You can't make it worse."

>> Why typecast at all? You're assigning to a variable of that size, shouldn't the
>> typecast do the assignment? (Does this suppress a warning or something?)
> 
> I did ":%s/uchar/char/g" instead of going over every individual use of "uchar",
> This patch (attached) removes a lot of those unnecessary typecasts, and cleans up
> the code formatting a lot, among other things like getting rid of the posixError stuff,
> about 350 lines removed
> 
>> Is sizeof(char) ever not 1?
> 
> There is support for multi-byte chars in gcc (i.e. "char x = 'ABCD';")

That's a character literal (which has a return type int), not a char variable.
Assigning it to a char will give you... I'm going to guess 'D'.

> but noone uses that terrible extension from my knowledge

It seems to warn about using it by default, even:

$ cat test2.c
#include <stdio.h>

int main(int argc, char *argv[])
{
  char c = 'ABCD';

  printf("%d\n", c);
}
$ gcc test2.c
test2.c: In function ‘main’:
test2.c:5:12: warning: multi-character character constant [-Wmultichar]
   char c = 'ABCD';
            ^~~~~~
test2.c:5:12: warning: overflow in conversion from ‘int’ to ‘char’ changes value
from ‘1094861636’ to ‘68’ [-Woverflow]
$ ./a.out
68

>> > or the xz stuff,
>>
>> If you want to peel out individual upstream public domain xz patches and adapt
>> them (one at a time) to apply to toybox's xzcat, I'd be very interesting in
>> reading and applying the results.
> 
> The main problem is that it takes a lot of work to patch upstream stuff and not break everything,
> I'll see what I can do, but I can't guarantee that I'll be able to get the bigger blocks of code
> like the ARM64 decoder in.
> 
>> > nor the csplit regressions I started to patch out,
>>
>> What were the csplit regressions?
> 
> A lot of things since I was testing the command manually when I first wrote it,

A test suite that TEST_HOST passes would be nice. I have the start of one, but
csplit is such an utterly terrible command (a half-assed sed that only wants to
write to files), I can't wrap my head around what anybody would ever WANT to use
it for.

I mean why have "prefix" and "suffix" when suffix is an arbitrary sprintf
string? Prefix on WHAT, it's not adding in the input filename, and you can't if
you try:

$ seq 1 10 | csplit - 2 %4% 7 -b '%s'
csplit: invalid conversion specifier in suffix: s

I checked busybox to see if they had tests, but the only mention of csplit in
the entire git tree there is docs/posix_conformance.txt under "Tools not supported".

>> Glancing at pending, I don't have a test environment for
>> arp, arping,
> 
> Networking administration stuff for ARP caches that can manipulates kernel ARP table entries,
> would probably require mkroot to test safely.

Yes, I know.

>> bootchartd,
> 
> A command with no standard; Described as "bootchartd is commonly used to profile the boot process.",
> Not sure what it does, let alone what it's supposed to do.

Gets a lot of people to poke me about it in email, apparently.

There's one in busybox. Its introductory commit (ff027d6f50bf from 2010) doesn't
say where it came from and provides self-fart-huffing documentation, a
"description" that assumes you already know what it does. And that documentation
is pretty much what you still get today from the one installed in my devuan $PATH:

  $ busybox bootchartd --help
  BusyBox v1.31.0 (2019-08-13 06:54:37 CDT) multi-call binary.

  Usage: bootchartd start [PROG ARGS]|stop|init

  Create /var/log/bootchart.tgz with boot chart data

  start: start background logging; with PROG, run PROG, then kill logging with
  USR1
  stop: send USR1 to all bootchartd processes
  init: start background logging; stop when getty/xdm is seen (for init scripts)
  Under PID 1: as init, then exec $bootchart_init, /init, /sbin/init

Start logging WHAT? What information gets written to this log? Who consumes the
(gzipped) log? In what format? Is it human readable with zcat?

what does "then kill logging with USR1" mean: does it automatically send USR1 to
something when it's done, or is that asking the USER to send USR1 to something?
What responds to the USR1 signal?

"all bootchartd processes"... it launches background processes? How many of them
do you expect?

Why would it assume gettty running is meaningful? (Is someone going to manually
log into your system from a login: prompt?) I don't know what gdm is. (Haven't
got one installed on my laptop. Gnu Database Manager?) Is there a way to specify
a DIFFERENT process to wait for? Is there some sort of polling granularity or
does it get notifications when child processes call exec?

There's a reason this is still in pending. I don't know WHY this command is, I
just know various people at a large company wanted and submitted it. There's
probably a youtube video on it somewhere. Probably by Khem Raj, he seems to know
most of the bits I don't. :)

>> brctl, crond, crontab,
> 
> Networking stuff in sbin/, divided into subcommands like ip, and a


The brctl command is "ethernet bridge control", which is ethernet level packet
routing. (Not TCP/IP, but LAN crap with zero hops and and tendency to have
broadcast storms if you're not careful.) I think I used it long ago to set up
equal cost multipath? (Two links acting as one link between two machines, for
extra bandwidth.)

Alas, this is ANOTHER kernel API that implemented a netlink extended version
alongside the IOCTL API to handle VLAN stuff (rather than extend the IOCTL), so
the "new" way of doing stuff is a reimplementation sharing no code. Because
passing binary structures across a network socket to the kernel is so much
better than passing the same binary structures across an ioctl.

There's also TUN/TAP stuff, ala https://landley.net/lxc/02-networking.html on my
todo list somewhere, and all this works into containers somehow, and I have
notes on it somewhere but it might be easiest to just redo the "rubber docker"
tutorial on github.

> NEEDROOT and a STAYROOT commands, maybe they can be tested safely without mkroot?

I vaguely understand what crond does, although I personally always used the "at"
command instead:

$ aptitude show at
Package: at
Version: 3.1.23-1
...
Description: Delayed job execution and batch processing
 At and batch read shell commands from standard input storing them as a job to
 be scheduled for execution in the future.

 Use
 at    to run the job at a specified time
 batch to run the job when system load levels permit
Homepage: http://blog.calhariz.com
Tags: admin::automation, implemented-in::c, interface::daemon, role::program,
      scope::utility, use::timekeeping

I have never understood what "crontab" is for. I think it calls vi with a file
locking wrapper? We haven't got a "passwdtab" or "resolv.conftab", you'd just
sudo and edit that without locking. (If you didn't want to use the "useradd" and
similar commands, which again aren't what crontab did, that's what "at" did.)

When I used cron as a newbie at rutgers on the SunOS workstations, there was
something like /etc/crontab and in theory ~/.crontab in your user account, but
students didn't get to run cron jobs without personal permission from the sysadmin.

On my laptop, it was powered down during scheduled cron stuff and when I powered
it up it would run deferred cron jobs while I was trying to use it and one of
the first things I generally did in a new Linux install was RIP OUT CRON...

*shrug* It's very traditional unix, and in posix, but not part of my workflow.

>> ipcrm, ipcs,
> 
> "It[POSIX] provides ipcrm and ipcs, but not ipcmk, so you can use System V IPC resources but not create them."
> - roadmap.html
> 
> I honestly didn't know we had these,

You could ask https://in.linkedin.com/in/ashwini-kumar-b2357116 why. I dunno if
the early galaxy series needed this back in 2014, or if they were reading
through posix or my roadmap and trying to fill it out.

> I don't know how I'm supposed to test resources I have no way to create,
> we'll need ipcmk eventually. These seem more feasible to test, although
> their tests will fail under mkroot until we
> have ipcmk

The ipcmk man page says it "allows you to create shared memory segments, message
queues, and semaphore arrays", and I haven't needed them?

Personally I do shared memory by having two processes mmap() the same file
(these days there's a tmpfs on /dev/shm, and for some reason another one on
mounted on /run just to make the accounting hard, before that you'd delete the
file after everybody had opened it as a signal for the OS to stop bother
flushing the contents to disk in the absence of memory pressure), and generally
handle inter-process blocking and atomicity through either pipe2(), mkfifo, or
open(O_CREAT|O_EXCL).

I implemented fcntl(fileno(ofp), F_SETLK, &lock) with struct flock in
lib/password.c because it's _specified_ somewhere that you're supposed to do
that for updating /etc/passwd and friends, although to be honest the last
implementation I did of that plumbing just used atomic filesystem operations
(rename() and open(O_CREAT|O_EXCL) with the plus-suffix version of the files)
and checked the file timestamp for "plus or minus five seconds from now" to blow
it away as stale if we rebooted in the middle of an update...

You can use network sockets for synchronization if you like (select() covers a
multitude of sins, but remember TCP_NODELAY), but that breaks on an embedded
device where you configured out the IP stack (or haven't assigned an address
range to any interface, you have to remember to "ifconfig lo up" or it's not there).

I am aware dbus exists, and am still not entirely certain why. (I remember
attending a panel about it at Linucon in 2005, I just didn't retain it.) I am
aware kdbus was written, and made a lot of people very angry. I'm aware android
binder exists, and at one point knew why but have forgotten. I cleaned up and
promoted inotifyd.c but generally find its use overkill. At one point, I read up
on how futexes work, went "not gonna do that", and left it to the experts, and
the futex(2) man page is still there if I change my mind...

I think the last thing I encountered that used posix message queues was IBM
MQSeries for Linux in 1999. The kernel's CONFIG_SYSVIPC menuconfig help text
gives "dosemu" as the example package that won't run if you don't switch support
for this on, and says to read a gnu "info" page and something from The Linux
Documentation Project (which Wikipedia[citation needed] describes as "dormant").
According to "git annotate init/Kconfig" the last time any of that help text got
touched was 2004.

That said, /proc/sys/fs/mqueue is present on my devuan box and this feature
isn't broken or badly designed, it's just... one of many ways to do what it
does. Oddly enough, busybox has ipcrm and ipcs, but not ipcmk. I think you can
create them by poking around under /proc/sys/fs/mqueue? (You can also delete and
list them there though, I'm told.)

(I very vaguely recall that the point of this stuff was the resources could
persist beyond the lifespan of programs using them? Which to be honest sounded
like a DOWNSIDE to me? Leaking resources until the next reboot seems like a bad
thing.)

To be honest, I'm tempted to clean up and promote them to "examples". Leaving
them "default n". There in case somebody needs it, but if so it would be nice if
they could send us a note letting us know they exist...

>> klogd, last,
> 
> Maybe? Their outputs will vary from system to system so we will need some way to run them in a airlock

No idea, I tend not to use it on the embedded systems I put together.

>> stty,
> 
> Dunno how to create a virtual terminal environment and poke at it with a script, this seems the most feasible,
> but I dunno _how_ to test it.

I do, it's just tedious.

And I use actual serial on real hardware. I've even got the USB adapter with the
three little wires that go into the HAT connector or wherever, which is always a
pain to set up. (The input and output aren't that dangerous, I've had them on 5V
and ground and nothing was damaged, but you also need the ground pin connected
for the system to be happy, and if you plug the ground pin into the wrong place
Bad Things Happen. I tend to take a photo with my phone and consult it when
setting another board up. I'm always tempted to just never plug in the ground
pin, and it always made the real hardware engineers very sad. I'm already not
wearing a wrist strap and using a piece of paper instead of the special mat, so
plugging in the ground pin was the least I could do.)

Oh, don't use an anti-static bag as the thing to rest the board on. It's
anti-static because it's _conductive_ and thus does not allow localized static
charges to accumulate. (Plastic is like the worst case scenario of what to
store/transport a board in, apparently. We'd get a board wrapped directly in
bubble wrap and coin flip if it came on...)

>> syslogd, traceroute.
> 
>> (I dunno what success looks like, needs to set up a VM for each I guess?)
>> Some of those I can test by hand, but a reproducible regression test producing
>> consistent results?
> 
> I could change that mkroot package I posted to the list that gets the test suite to work
> under mkroot by copying the host bash to run in QEMU instead of a chroot and attempt
> to test those,

I can test most of this stuff by hand. Boot a devuan live CD under kvm with a 2
gig loopback file formatted as ext4 for scratch space, then "git clone" and wget
stuff to build stuff and run test scripts...

But "done once by hand" is not regression testing, and hard to refer back to or
reproduce when you go "wait, but did I test THIS bit"...

>> Then again that didn't stop ps or top from getting promoted. I could hand-test
>> traceroute I guess. But I've never used bootchartd or ipcs and hardly ever use
>> cron or modprobe, and "last" like "w" is a holdover from the days of multi-user
>> systems. If anybody else logs onto my laptop something is WRONG.
> 
> "last" also stores login info from the beginning of time, so it'll vary from OS install
> to OS install.

Hence mkroot where I can test in a known environment, starting with empty logs,
with known /etc/passwd containing known users, and have the automated plumbing
perform test logins and such...

> They are hard to test now because we have no airlock system to test in, but once we get one
> they should be no problem.
> 
>> I hate ip.c conceptually,
> 
> Is it because it relies on subcommands like git, because the output is formatted
> for humans instead of machines?

Yes? It's a giant hairball that reinvents the wheel and isn't scriptable.

Did you ever read a book called "the unix philosophy" by Mike Gancarz? I liked
that book...

>> but other people use it. If you wanted to poke at that
>> one, you won't interfere with any plans of mine. :)
>>
>> I need to rewrite route based on the netfilter API that lets it address multiple
>> routing tables. that's one of the two commands that mkroot enables out of
>> pending (the other being toysh).
>>
>> Getting unicode to work in tr is hard BUT I think I figured out how...
> 
> I'd be interesting in knowing how you plan to make tr UTF8 safe

I dunno about "safe", but the trick is not to try to expand sets into an array
of characters up front, but to keep them and match them up. Which it's possible
debian's is probably already doing:

$ tr [:lower:] [:digit:]
tr: when translating, the only character classes that may appear in
string2 are 'upper' and 'lower'
$ tr [:alnum:] [:lower:]
tr: misaligned [:upper:] and/or [:lower:] construct

So expand_set() is conceptually wrong, it needs match logic telling if this
(wide) character is part of this set. Because you can't expand unicode ranges
like that, it doesn't work that way.

There are other problems, of course:
https://landley.net/notes-2013.html#18-06-2013

And this would deviate from what debian is doing back in ascii-land:

  $ echo 'apt-config x86_64' | tr a[:digit:]c xyz
  xpt-zonfig xzz_zz

Because mine would say "xpt-zonfig xyy_yy" instead. Which makes MORE SENSE to
me, but isn't what posix/gnu says to do. From the tr man page:

  SET2 is extended  to  length  of SET1  by  repeating its last character as
  necessary. Excess characters of SET2 are ignored.

So tr a0123456789c vs xyz becomes xyzzzzzzzzzz and only 0 gets translated to y.
Which yeah, I can see. But is also deeply stupid. The question is what scripts
would break...

*shrug* I might also have a -u option for unicode. And there's a design decision
whether tr should works on code points or on combined character groups. (Or just
always pass through combining characters unchanged?)

Rob


More information about the Toybox mailing list