[Toybox] grep and empty regexes

enh enh at google.com
Sun Jul 28 10:32:58 PDT 2019


any thoughts on this? the choices seem to be:

* keep BSD behavior on BSD libc systems, GNU behavior on GNU libc
systems. the only toybox change is to tweak the tests to have the
right expectations for the system they're on, probably by checking
whether -E with a leading + is an error or not? (because there are
inherently going to be some differences, and the leading + is one of
them.) i can send a patch for that if you'd prefer to go that way.

* make the GNU behavior an error everywhere. (i.e. check for the empty
regex and reject it.) doesn't address other issues (like leading +).

* try to work around BSD behavior (this patch).

it might come down to "where did these tests come from?" --- did you
hit these in practice somewhere, or was this just you poking at corner
cases and wondering what happens if you supply an empty regex? (for
obvious reasons it's a bit tricky for me to search for uses of an
empty regex :-) )

On Wed, Jul 24, 2019 at 4:25 PM enh <enh at google.com> wrote:
>
> so here's two FAILs and one accidental PASS (because the test doesn't
> actually check the return code)...
>
> grep: bad REGEX '': empty (sub)expression
> FAIL: grep -e blah -e ''
> echo -ne "one one one\n" > input
> echo -ne '' | grep -e blah -e '' input
> --- expected 2019-07-24 14:21:52.872813591 -0230
> +++ actual 2019-07-24 14:21:52.872813591 -0230
> @@ -1 +0,0 @@
> -one one one
> grep: bad REGEX '': empty (sub)expression
> PASS: grep -w ''
> grep: bad REGEX '': empty (sub)expression
> FAIL: grep -w '' 2
> echo -ne "one  two\n" > input
> echo -ne '' | grep -w '' input
> --- expected 2019-07-24 14:21:52.982813591 -0230
> +++ actual 2019-07-24 14:21:52.982813591 -0230
> @@ -1 +0,0 @@
> -one  two
>
> POSIX says there's no such thing as an empty regular expression. (by
> having a grammar that excludes the possibility:
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html)
>
> BSD agrees, and Android and macOS' regcomp() rejects the empty regular
> expression.
>
> GNU apparently disagrees (as i learned from your tests).
>
> not sure what to do here, in particular because -- given your tests --
> i don't think we can represent the GNU interpretation as a POSIX
> regular expression?
>
> ...except i think there's a bug in the BSD implementation that does
> allow '()'. seems to have been there for at least 26 years judging by
> https://github.com/freebsd/freebsd/blame/master/lib/libc/regex/regcomp.c#L383
> so i think it's probably safe to rely on that for the time being.
> glibc's happy with it too.
>
> patch attached. (i've said "BSD" rather than "POSIX" in the code
> comment because BSD makes it clearer that this is a practical rather
> than just theoretical concern.)



More information about the Toybox mailing list