[Toybox] [PATCH] Make it easier to switch regex implementations.

Rob Landley rob at landley.net
Tue Nov 10 01:05:04 PST 2020


On 11/5/20 6:12 PM, enh wrote:
> On Mon, Nov 2, 2020 at 6:40 PM Rob Landley <rob at landley.net> wrote:
>>
>> On 11/2/20 1:55 PM, enh wrote:
>>> On Fri, Oct 30, 2020 at 7:12 PM Rob Landley <rob at landley.net> wrote:
>>>> On 10/28/20 7:06 PM, enh via Toybox wrote:
>>>>> One reason to use toybox on the host is to get the same behavior across
>>>>> Android/Linux/macOS. Unfortunately (as we've seen from a few bugs) one
>>>>> area where that doesn't quite work is that toybox uses the libc regular
>>>>> expression implementation.
>>>>
>>>> Which another reason the version of toybox you distribute should be statically
>>>> linked against bionic.
>>>
>>> aye, but "host bionic" is a longer project with no-one funded to work on it.
>>
>> What's actually involved? Mostly seems to be switching back to a minimal _start?
>>
>> If there _is_ anything missing you care about (ls -l showing usernames?) then
>> license-wise, you should be able to pull anything you need from musl...
> 
> funnily enough, someone just merged a patch to read /etc/passwd and
> /etc/group if we're not on device (the device uses the usual format,
> but has multiple different locations so that SoC vendors and OEMs and
> the core platform all get their own file on their own partition).
> 
> the order zero problem is that i don't think anyone has the full list
> of what needs to be done.

Rich Felker wrote a C library from scratch. He started working on it back when I
was still busybox maintainer, and our IRC conversation about actually releasing
it as a project and licensing it was... 2012 I think? Yeah, he objected to my
flounce https://landley.net/notes-2012.html#18-01-2012 and wanted to know why I
cared so much (which was only 2 months after
https://landley.net/notes-2011.html#13-11-2011 so I had links ready, including
https://landley.net/notes-2010.html#19-07-2010).

Rich actually worked to get _binary_ compatability with glibc to run things like
the flash plugin. It's not perfect, but he's gone pretty deeply into this area
and has more domain expertise than a lot of the glibc guys (who never went back
and analyzed their own stuff the same way).

He can't tell you what your _objectives_ are, but if you want libc host
compatibility domain expertise, that's the guy who knows where the bodies are
buried.

>>>>> That's fine, and mostly what users want, but
>>>>> those folks trying to get the exact same behavior everywhere might want
>>>>> to switch in a known regex implementation (bionic's NetBSD regex
>>>>> implementation, say) for increased consistency.
>>>>
>>>> By statically linking the binaries against bionic. :)
>>>>
>>>> (Did you ever fix the "hello world segfaults in a chroot that doesn't have
>>>> /dev/null because bionic's _start code does a lot with no error checking" issue?
>>>
>>> no, that's actually a deliberate crash. that's definitely not a
>>> supported _device_ configuration, and we deliberately minimize the
>>> differences between host and device. (it's 99% of the point of having
>>> host bionic in the first place!)
>>
>> Meaning you can't link PID 1 against bionic unless you have a static /dev, and
>> the kernel guys keep rejecting my "make initramfs honor CONFIG_DEVTMPFS_MOUNT"
>> patch. Sigh, I need to teach toybox cpio to accept non-filesystem metadata. It's
>> on the todo list...
> 
> if this actually causes trouble for the hermetic build or GCE types we
> can think about relaxing it, but i'm not aware of anyone [other than
> Android, which _wants_ a static /dev] using bionic for init.

scripts/mkroot.sh in the toybox build creates a qemu-bootable Linux system with
toybox defconfig as initramfs. (Ok, defconfig + route + sh, but I'm working to
get both promoted into defconfig.) You can chroot into the result, or qemu-*.sh
in the directory to launch qemu with /dev/console connected the stdin/stdout of
qemu.

I test that with the host libc and with the musl-cross-make toolchains for a
bunch of targets, but if I do it with the NDK the result immediately segfaults
no matter what binary I run, because /dev is empty. The reason /dev is empty is
the build runs as a normal user (I can't mknod without root access) and I
haven't taught toybox cpio to read the linux cpio generation text format yet.
(It's on the todo list.)

I want the initramfs to be external rather than statically linked into the
kernel because then I can replace it without rebuilding the kernel, and other
people can easily extract it or plug it into _their_ kernels.

>>>>> That actually works pretty well, but portability.h has an #ifndef test
>>>>> for REG_STARTEND before including <regex.h> that gets in the way. To
>>>>> make up for that, this patch removes the unnecessary #include <regex.h>
>>>>> from grep.c itself.
>>>>
>>>> Applied, but it's one measure of a whack-a-mole problem space.
>>>
>>> there's never going to be a "host bionic for macOS" anyway, so this is
>>> necessary if not sufficient.
>>
>> Statically linking on macos seems less of a minefield, though? They ship libtool
>> (which has nothing to do with gnu libtool, it's a combined ar+ranlib from next)
>> and it just works?
> 
> yeah, but the problem we're trying to solve here is "same regular
> expression implementation", not "static binary" :-)

The problem at hand, yes. :)

>> Musl isn't broken that way, and neither is macos. I thought the point of host
>> bionic (other than behaving the same everywhere so catching bugs early and
>> avoiding potential distro skew and thus build dependencies for where the AOSP
>> prebuilts get updated from) is to avoid glibc's intentional sabotage of static
>> linking.
> 
> depends on who's asking. although there's no team blocked on not
> having host bionic, there are several teams for whom it might be
> useful. some of them for one reason, some the other, and some would
> probably like both. (and others who just don't want to have to deal
> with owning another glibc prebuilt to go with the existing 2.17 one,
> and ...)

I test toybox built with the NDK, and have a TODO item to get mkroot to work
with that, which currently goes through "make toybox cpio accept the
gen_init_cpio input file format", which then bumps up against "yeah but
shouldn't it work without a toybox airlock using host tools too" and is still on
the todo list until I figure out what approach I want to take there. (I guess I
can gate it on CROSS_COMPILE like --no-preserve-owner, but that's not really the
right test? Plus that doesn't add lines to the script and an alternate path here
probably would. I'm proud of my tiny system builder script.)

Rob



More information about the Toybox mailing list