[Toybox] CONFIG_TOYBOX_ZHELP

Rob Landley rob at landley.net
Sat Jan 13 12:44:45 PST 2024


On 1/12/24 14:25, enh via Toybox wrote:
> this will hardly come as a surprise to you, but just fyi, i _haven't_
> enabled this for AOSP,

Understood. You weren't really the audience, it's more a "parity with busybox"
feature.

They use bzip2 compression instead of gzip, which I haven't merged yet because
the bzip2 implementation I did way back when still needed the old string
fallback sort logic from the original, which I never did wrap my head around
entirely. (WHY is that particular set of sort algorithms expected to complete in
constant-ish time? I can more or less understand each one individually, but the
logic behind the combination/cascade of them eluded me. Mostly it got parked and
I never went back to it. I did my own implementation of everything _except_ the
sort, though.

The only other compression algorithm I have in toybox is xz, which is there
because there's an honest-to-Kapo public domain implementation out there:

https://git.tukaani.org/xz-embedded.git

But as the README says, "currently only decoding is implemented" and doing my
own compressor for it is "hide in a cave for six weeks" level of focus, which
was a LOT easier to do as a single twentysomething living in cheap student
housing working six months contracts with as long as a gap as I wanted between
them, vs being married with a mortgage, actual use for health insurance, and a
political party publicly dedicated to destroying the social security program
before I qualify. (Maybe after 1.0, but I gotta finish sh and do awk and make
first, and git, and finish diff, build LFS, untangle AOSP... implement screen
and rsync...)

> (a) to avoid building deflate.c (which would
> make it harder to recognize if we accidentally stopped using "real"
> zlib[1]), and (b) because we check in the generated files, and i'm not
> likely to check in a binary blob, even if it is encoded in ASCII as a
> c byte array :-)
> 
> thanks for keeping the uncompressed path!

Always the plan. Besides, if the binary lives on something like squashfs,
decompressing it twice just wastes CPU. And the standalone command builds
wouldn't really benefit either without a shared lib/lib.so.

(Sigh. You're going to want a shared toylib.so, aren't you...)

> i haven't built with it enabled on linux yet, but building for macOS
> with it enabled at least i saw this spam for every (?) toy:

Huh, when I ssh to the macos box Zach Van Rijn let me use and "homebrew; git
pull; make distclean macos_defconfig toybox" it goes:

$ gmake distclean macos_defconfig toybox
cleaned
root cleaned
removed .config
cc -o kconfig/conf kconfig/conf.c kconfig/zconf.tab.c -DKBUILD_NO_NLS=1 \
	-DPROJECT_NAME=\"ToyBox\"
scripts/genconfig.sh
KCONFIG_ALLCONFIG=./kconfig/macos_miniconfig kconfig/conf -n Config.in > /dev/null
scripts/make.sh
Library probe
generated/{Config.in,newtoys.h,flags.h,globals.h,tags.h,help.h}
Compile toybox
........................................................................toys/posix/cp.c:279:16:
warning: 'mknodat' is only available on macOS 13.0 or newer
[-Wunguarded-availability-new]
            : !mknodat(cfd, catch, try->st.st_mode, try->st.st_rdev))
               ^~~~~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/stat.h:395:9:
note: 'mknodat' has been marked as being introduced in macOS 13.0 here, but the
deployment target is macOS 12.0.0
int     mknodat(int, const char *, mode_t, dev_t) __API_AVAILABLE(macos(13.0),
ios(16.0), tvos(16.0), watchos(9.0));
        ^
toys/posix/cp.c:279:16: note: enclose 'mknodat' in a __builtin_available check
to silence this warning
            : !mknodat(cfd, catch, try->st.st_mode, try->st.st_rdev))
               ^~~~~~~
...1 warning generated.
....................................................
cfarm104 (homebrew):toybox landley$


Which admittedly has a giant "apple(tm) version skew" warning in the middle but
I honestly have no idea how to fix that: mknodat() is a posix-2008 function:

https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/mknodat.html

Which Apple is now claiming it only added October of 2022?

https://en.wikipedia.org/wiki/MacOS_Ventura

I mean... Really? They didn't catch up to posix-2008 for FIFTEEN YEARS? Steve
Jobs was still alive for almost four years after that came out...

> ```
> In file included from toys/posix/who.c:26:
> In file included from ./toys.h:8:
> ./generated/config.h:695:9: warning: 'CFG_TOYBOX_ZHELP' macro
> redefined [-Wmacro-redefined]
> #define CFG_TOYBOX_ZHELP 0
>         ^
> ./generated/config.h:691:9: note: previous definition is here
> #define CFG_TOYBOX_ZHELP 1
>         ^

Emitted into config.h twice. Odd.

The mac build I just did has just:

#define CFG_TOYBOX_ZHELP 0
#define USE_TOYBOX_ZHELP(...)

The first of which is line 693.

This is generated from the .config file via the giant "legacy compatible" sed
invocation on line 161 of scripts/make.sh, looking for "CONFIG_BLAH=y" and
"# CONFIG_BLAH is not set" lines, to produce the 0 and blank defines, or the 1
and _VA_ARGS_ defines.

Unless the sed hiccuped (unlikely), that says your .config has two instances of
the same symbol. And given that they're emitted in pairs (CFG and USE macros for
each symbol), there's another symbol between the redundant ones you've got. So
you might have something like

CONFIG_ZHELP=y
CONFIG_WALRUS=y
# CONFIG_ZHELP is not set

In your .config? (Which I don't think the old 2.6.12 kconfig plumbing is
_capable_ of emitting, but you might get if you manually patched your .config file?)

> ____
> 1. yes, i'm acutely aware there are far too many zlib forks, but ykwim :-)

And I have added to that pile, yes. But that's what you get when a 40 year old
format is still in widespread use and even documented in an IETF RFC.

Phil Katz did good. Not only did appnote.txt establish a STANDARD, but deflate
is still the best _streaming_ compression algorithm I'm aware of. Stuff like ssh
-C all uses deflate because anything "better" is designed around larger buffer
sizes that can't deliver partial results.

You can feed deflate arbitrarily small chunks (down to individual bytes) and
have it pass on the results immediately to the far side (as long as you notify
it of flushes; sure doing that reduces how well the compression works because
you may prematurely break a run or force literal handling, but doesn't diminish
the compression efficiency of future larger data chunks in the same stream). And
since the lookback buffer's only 32k it's cheap to do on each end and has even
been implemented in hardware.

There were a zillion compressors of the day and _only_ deflate has really
survived. There's a reason if I only wind up implementing one compression-side
in toybox, it has to be deflate. It's the 80/20 of compression algorithms.

Rob


More information about the Toybox mailing list