[Toybox] [PATCH] Make it easier to switch regex implementations.

enh enh at google.com
Thu Nov 5 16:12:50 PST 2020


On Mon, Nov 2, 2020 at 6:40 PM Rob Landley <rob at landley.net> wrote:
>
> On 11/2/20 1:55 PM, enh wrote:
> > On Fri, Oct 30, 2020 at 7:12 PM Rob Landley <rob at landley.net> wrote:
> >> On 10/28/20 7:06 PM, enh via Toybox wrote:
> >>> One reason to use toybox on the host is to get the same behavior across
> >>> Android/Linux/macOS. Unfortunately (as we've seen from a few bugs) one
> >>> area where that doesn't quite work is that toybox uses the libc regular
> >>> expression implementation.
> >>
> >> Which another reason the version of toybox you distribute should be statically
> >> linked against bionic.
> >
> > aye, but "host bionic" is a longer project with no-one funded to work on it.
>
> What's actually involved? Mostly seems to be switching back to a minimal _start?
>
> If there _is_ anything missing you care about (ls -l showing usernames?) then
> license-wise, you should be able to pull anything you need from musl...

funnily enough, someone just merged a patch to read /etc/passwd and
/etc/group if we're not on device (the device uses the usual format,
but has multiple different locations so that SoC vendors and OEMs and
the core platform all get their own file on their own partition).

the order zero problem is that i don't think anyone has the full list
of what needs to be done.

> >>> That's fine, and mostly what users want, but
> >>> those folks trying to get the exact same behavior everywhere might want
> >>> to switch in a known regex implementation (bionic's NetBSD regex
> >>> implementation, say) for increased consistency.
> >>
> >> By statically linking the binaries against bionic. :)
> >>
> >> (Did you ever fix the "hello world segfaults in a chroot that doesn't have
> >> /dev/null because bionic's _start code does a lot with no error checking" issue?
> >
> > no, that's actually a deliberate crash. that's definitely not a
> > supported _device_ configuration, and we deliberately minimize the
> > differences between host and device. (it's 99% of the point of having
> > host bionic in the first place!)
>
> Meaning you can't link PID 1 against bionic unless you have a static /dev, and
> the kernel guys keep rejecting my "make initramfs honor CONFIG_DEVTMPFS_MOUNT"
> patch. Sigh, I need to teach toybox cpio to accept non-filesystem metadata. It's
> on the todo list...

if this actually causes trouble for the hermetic build or GCE types we
can think about relaxing it, but i'm not aware of anyone [other than
Android, which _wants_ a static /dev] using bionic for init.

> >>> That actually works pretty well, but portability.h has an #ifndef test
> >>> for REG_STARTEND before including <regex.h> that gets in the way. To
> >>> make up for that, this patch removes the unnecessary #include <regex.h>
> >>> from grep.c itself.
> >>
> >> Applied, but it's one measure of a whack-a-mole problem space.
> >
> > there's never going to be a "host bionic for macOS" anyway, so this is
> > necessary if not sufficient.
>
> Statically linking on macos seems less of a minefield, though? They ship libtool
> (which has nothing to do with gnu libtool, it's a combined ar+ranlib from next)
> and it just works?

yeah, but the problem we're trying to solve here is "same regular
expression implementation", not "static binary" :-)

> Glibc is uniquely broken because Ulrich Drepper personally hated static linking
> (https://www.akkadia.org/drepper/no_static_linking.html) and intentionally
> sabotaged it (moving things like name resolution behind dlopen), and when the
> egcs guys... sorry, the eglibc guys were lured back into the fold and put
> another committee in charge of the project instead of a maintainer (so nobody
> would have authority to complain about the FSF (such as Drepper's polemic
> against Stallman starting with "And now some not so nice things" in
> https://sourceware.org/legacy-ml/libc-announce/2001/msg00000.html), the
> committee was intentionally organized to be unable to tackle any remotely
> political issues, and thus couldn't undo Drepper's anti-static prejudice without
> admitting a mistake had been made by "the project". So they've been throwing
> good code after bad ever since, and glibc is uniquely insane and broken as a
> result. (Which means people have been inventing crap like "snap" and "flatpak"
> so they can statically link dynamically linked programs by bundling up the
> resulting filesystem and loopback mounting it in a container. Sigh. It's like
> watching people build redstone computers in minecraft.)
>
> Musl isn't broken that way, and neither is macos. I thought the point of host
> bionic (other than behaving the same everywhere so catching bugs early and
> avoiding potential distro skew and thus build dependencies for where the AOSP
> prebuilts get updated from) is to avoid glibc's intentional sabotage of static
> linking.

depends on who's asking. although there's no team blocked on not
having host bionic, there are several teams for whom it might be
useful. some of them for one reason, some the other, and some would
probably like both. (and others who just don't want to have to deal
with owning another glibc prebuilt to go with the existing 2.17 one,
and ...)

> > unfortunately, something (presumably a kernel or file system change)
> > seems to have broken a dd test on cuttlefish, so that's the next thing
> > i need to look at when i can find some time:
> >
> > FAIL: dd sync,noerror
> > echo -ne "I WANT\n" > input
> > echo -ne '' | dd if=input of=outFile seek=8860 bs=1M conv=sync,noerror
> > 2>/dev/null &&
> >    stat -c "%s" outFile && rm -f outFile
> > --- expected 2020-10-29 16:10:58.647991948 -0500
> > +++ actual 2020-10-29 16:10:58.667991947 -0500
> > @@ -1 +1 @@
> > -9291431936
> > +701497344
>
> I _really_ need to make a (5th?) attempt to clean up dd. But not until I can
> carve out 2 uninterrupted weeks for it. Maybe new year's? I specced out the
> block repacking layer it needs to insulate itself from that sort of thing
> _years_ ago, and just... my todo list runneth over.

turns out this was my fault... my regex.h patch meant we were
#including something before we'd set _FILE_OFFSET_BITS to 64, so we
were broken on 32-bit. the diff in hex makes it a bit more obvious :-)

-0x229D00000
+0x29D00000

i've sent you a fix.

> Rob



More information about the Toybox mailing list