[Toybox] grep and empty regexes

Tue Jul 30 18:52:11 PDT 2019

On Tue, Jul 30, 2019 at 4:12 PM Rob Landley <rob at landley.net> wrote:
>
> On 7/30/19 12:16 PM, enh wrote:
> >> Sounds like "wait for people to complain, then whack-a-mole it" is a reasonable
> >> option for the moment.
> >>
> >> (You also have the ability to tweak bionic's regex plumbing, although the
> >> workaround for _this_ issue is a 2 line fix, so...)
> >
> > yeah, though for something like the regex code i worry about the old
> > n+1 joke about standards.
>
> Yes and no. You're not inventing new features, Android is theoretically a Linux
> variant and this is a standard Linux feature, and "this input is otherwise an
> error and an abort" means you're not blocking anybody's use case (or at least
> it's hard to see how this change would break compatibility with existing code).

(i thought you were the one who persuaded the world that Linux is just
the kernel? :-P )

yeah, in this case it's probably safe.

> The downside is code coming to rely on it that then _wouldn't_ work on MacOS
> without modification. (I assume you care about Java code running in both
> contexts and the regex behavior being visible through the java standard library?
> Because my understanding is C or C++ is already going to be a forest of #ifdefs
> trying to support both, but maybe I'm wrong about that...)

no, Java's fine because Java has its own regex implementation and/or
uses icu4c; i haven't kept up, but either way, no-one who cares uses
the libc regex implementation. even C++ doesn't :-)

but the motivation for toybox on macOS for me is basically "reduce
differences when building on the mac for the mac SDK versus building
on debian for everything else". (life would be better if we could
cross-compile for the mac.)

> > (it would be easier if there was _one_ BSD to talk to, but we use the
> > NetBSD regex code while Apple uses the FreeBSD regex code...)
> >
> >>> would you like me to move the workaround into xregcomp instead? or do
> >>> you want to wait until we see someone need this in sed or wherever
> >>> first?
> >> What would a sed test case look like?
> >
> > (i didn't realize that an empty regex means "repeat the previous one".)
>
> There's 3 caveats in the "regular expression" section of
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html
> (I think I've implemented them all, but see "test coverage" below. :)
>
> >> If it doesn't break anything, moving it into xregcomp seems like the right thing
> >> to do.
> >
> > all the tests (not just the grep tests) still pass, so i'll send you the patch.
>
> Cool.
>
> >>>> I'm glad you're giving the test suite some attention, but you're polishing
> >>>> what's _there_ and my concerns are more about what _isn't_. Longer term I need
> >>>> to A) write a gazillion more tests (based on a close re-reading of the spec, the
> >>>> relevant man page or RFC, and the source code), B) get mkroot to the point I can
> >>>> run more sheep across the minefield and catch issues with real world data.
> >>>
> >>> oh, yeah, *coverage* is a huge blind spot for us at the moment, and
> >>> something i want to look at. but first i wanted to get as many of the
> >>> tests running in presubmit as possible, to maximize my
> >>> pain^W^W^Wminimize the number of bugs that make it through to folks
> >>> who're just trying to build AOSP/just trying to use a device. not
> >>> every OEM is as keen on the idea of a reproduceable hermetic build as
> >>> you might expect, so giving them fewer stones to throw seems like a
> >>> good idea :-)
> >>
> >> Indeed.
> >>
> >>> i'm down to just blkid and du failures on a local taimen device now...
> >>
> >> Is that building for the taimen device, or building on the taimen device?
> >
> > for :-)
>
> Baby steps.
>
> >> Do you have a lot of NTFS disk labels you need identified? (I.E. is this a use
> >> case you'd actually use, or just a completeness thing?)
> >
> > i have exactly one NTFS disk image --- this one in the test suite. if
> > we had the opposite of `toyonly` i'd be tempted to just `toyonly` the
> > output without the label and `nontoyonly` the output with the label.
>
> $ /sbin/blkid -s UUID -s TYPE ntfs.img
> ntfs.img: UUID="6EE1BF3808608585" TYPE="ntfs"
>
> I should implement blkid -s, then change the test and add a comment why.

if you submit the existing blkid patch that fixes the UUIDs to avoid
conflicts, i can have a look at -s one of these evenings.

> >>> as for du versus the extra space used for extended attributes, i'm
> >>> still not sure what to do about that...
> >>
> >> My plan was running the tests in a known environment. I'm working on getting
> >> mkroot to where I can make an ext2 image (or maybe vfat?), loopback mount it,
> >> and run df against that so I get consistent results without a different
> >> filesystem changing the results.
> >>
> >> It should also run in a container, but half the time what you're debugging is
> >> what's different about your system that's giving "different but not wrong"
> >> results. Needs a reference implementation to regression test in so you can see
> >> _what_ is different...
> >
> > the likelihood that we have legitimate differences (as is actually the
> > case with du at least) is why i'm trying to resist the urge to just
> > ignore failures, especially on the "skip the whole tool" level.
>
> Yeah, I need an android test environment because regression testing android is a
> big deal. Hmmm. Pity AOSP hasn't got a makeroot-like target that gives me just a
> minimal root filesystem I can boot to a shell prompt under qemu. (Preferably
> without needing to run overnight to build it. Beating such a thing out of the
> AOSP build has been on my todo list forever, but you know how that goes.)

there _was_ such a thing long ago, but i suspect these days the
dependencies would be "just about everything" anyway.

> Rob