[Toybox] [PATCH] Make it easier to switch regex implementations.

enh enh at google.com
Tue Nov 10 16:02:04 PST 2020


On Tue, Nov 10, 2020 at 12:54 AM Rob Landley <rob at landley.net> wrote:
>
> On 11/5/20 6:12 PM, enh wrote:
> > On Mon, Nov 2, 2020 at 6:40 PM Rob Landley <rob at landley.net> wrote:
> >>
> >> On 11/2/20 1:55 PM, enh wrote:
> >>> On Fri, Oct 30, 2020 at 7:12 PM Rob Landley <rob at landley.net> wrote:
> >>>> On 10/28/20 7:06 PM, enh via Toybox wrote:
> >>>>> One reason to use toybox on the host is to get the same behavior across
> >>>>> Android/Linux/macOS. Unfortunately (as we've seen from a few bugs) one
> >>>>> area where that doesn't quite work is that toybox uses the libc regular
> >>>>> expression implementation.
> >>>>
> >>>> Which another reason the version of toybox you distribute should be statically
> >>>> linked against bionic.
> >>>
> >>> aye, but "host bionic" is a longer project with no-one funded to work on it.
> >>
> >> What's actually involved? Mostly seems to be switching back to a minimal _start?
> >>
> >> If there _is_ anything missing you care about (ls -l showing usernames?) then
> >> license-wise, you should be able to pull anything you need from musl...
> >
> > funnily enough, someone just merged a patch to read /etc/passwd and
> > /etc/group if we're not on device (the device uses the usual format,
> > but has multiple different locations so that SoC vendors and OEMs and
> > the core platform all get their own file on their own partition).
> >
> > the order zero problem is that i don't think anyone has the full list
> > of what needs to be done.
>
> Rich Felker wrote a C library from scratch. He started working on it back when I
> was still busybox maintainer, and our IRC conversation about actually releasing
> it as a project and licensing it was... 2012 I think? Yeah, he objected to my
> flounce https://landley.net/notes-2012.html#18-01-2012 and wanted to know why I
> cared so much (which was only 2 months after
> https://landley.net/notes-2011.html#13-11-2011 so I had links ready, including
> https://landley.net/notes-2010.html#19-07-2010).
>
> Rich actually worked to get _binary_ compatability with glibc to run things like
> the flash plugin. It's not perfect, but he's gone pretty deeply into this area
> and has more domain expertise than a lot of the glibc guys (who never went back
> and analyzed their own stuff the same way).
>
> He can't tell you what your _objectives_ are, but if you want libc host
> compatibility domain expertise, that's the guy who knows where the bodies are
> buried.

no, it's actually stuff like "non-Android keeps its list of users in
/etc/passwd" and "non-Android doesn't have a netd daemon that all DNS
queries should go via" and so on.

host bionic _is_ actually in use for some things already. it's just
not used for a wide _variety_ of tasks, so there are probably still
bits that don't work that just haven't been used yet.

> >>>>> That's fine, and mostly what users want, but
> >>>>> those folks trying to get the exact same behavior everywhere might want
> >>>>> to switch in a known regex implementation (bionic's NetBSD regex
> >>>>> implementation, say) for increased consistency.
> >>>>
> >>>> By statically linking the binaries against bionic. :)
> >>>>
> >>>> (Did you ever fix the "hello world segfaults in a chroot that doesn't have
> >>>> /dev/null because bionic's _start code does a lot with no error checking" issue?
> >>>
> >>> no, that's actually a deliberate crash. that's definitely not a
> >>> supported _device_ configuration, and we deliberately minimize the
> >>> differences between host and device. (it's 99% of the point of having
> >>> host bionic in the first place!)
> >>
> >> Meaning you can't link PID 1 against bionic unless you have a static /dev, and
> >> the kernel guys keep rejecting my "make initramfs honor CONFIG_DEVTMPFS_MOUNT"
> >> patch. Sigh, I need to teach toybox cpio to accept non-filesystem metadata. It's
> >> on the todo list...
> >
> > if this actually causes trouble for the hermetic build or GCE types we
> > can think about relaxing it, but i'm not aware of anyone [other than
> > Android, which _wants_ a static /dev] using bionic for init.
>
> scripts/mkroot.sh in the toybox build creates a qemu-bootable Linux system with
> toybox defconfig as initramfs. (Ok, defconfig + route + sh, but I'm working to
> get both promoted into defconfig.) You can chroot into the result, or qemu-*.sh
> in the directory to launch qemu with /dev/console connected the stdin/stdout of
> qemu.
>
> I test that with the host libc and with the musl-cross-make toolchains for a
> bunch of targets, but if I do it with the NDK the result immediately segfaults
> no matter what binary I run, because /dev is empty. The reason /dev is empty is
> the build runs as a normal user (I can't mknod without root access) and I
> haven't taught toybox cpio to read the linux cpio generation text format yet.
> (It's on the todo list.)
>
> I want the initramfs to be external rather than statically linked into the
> kernel because then I can replace it without rebuilding the kernel, and other
> people can easily extract it or plug it into _their_ kernels.
>
> >>>>> That actually works pretty well, but portability.h has an #ifndef test
> >>>>> for REG_STARTEND before including <regex.h> that gets in the way. To
> >>>>> make up for that, this patch removes the unnecessary #include <regex.h>
> >>>>> from grep.c itself.
> >>>>
> >>>> Applied, but it's one measure of a whack-a-mole problem space.
> >>>
> >>> there's never going to be a "host bionic for macOS" anyway, so this is
> >>> necessary if not sufficient.
> >>
> >> Statically linking on macos seems less of a minefield, though? They ship libtool
> >> (which has nothing to do with gnu libtool, it's a combined ar+ranlib from next)
> >> and it just works?
> >
> > yeah, but the problem we're trying to solve here is "same regular
> > expression implementation", not "static binary" :-)
>
> The problem at hand, yes. :)
>
> >> Musl isn't broken that way, and neither is macos. I thought the point of host
> >> bionic (other than behaving the same everywhere so catching bugs early and
> >> avoiding potential distro skew and thus build dependencies for where the AOSP
> >> prebuilts get updated from) is to avoid glibc's intentional sabotage of static
> >> linking.
> >
> > depends on who's asking. although there's no team blocked on not
> > having host bionic, there are several teams for whom it might be
> > useful. some of them for one reason, some the other, and some would
> > probably like both. (and others who just don't want to have to deal
> > with owning another glibc prebuilt to go with the existing 2.17 one,
> > and ...)
>
> I test toybox built with the NDK, and have a TODO item to get mkroot to work
> with that, which currently goes through "make toybox cpio accept the
> gen_init_cpio input file format", which then bumps up against "yeah but
> shouldn't it work without a toybox airlock using host tools too" and is still on
> the todo list until I figure out what approach I want to take there. (I guess I
> can gate it on CROSS_COMPILE like --no-preserve-owner, but that's not really the
> right test? Plus that doesn't add lines to the script and an alternate path here
> probably would. I'm proud of my tiny system builder script.)
>
> Rob



More information about the Toybox mailing list