[Toybox] pathological case in sed s///g

Rich Felker dalias at libc.org
Mon May 6 19:40:29 PDT 2019


On Mon, May 06, 2019 at 07:05:46PM -0500, Rob Landley wrote:
> On 5/6/19 12:48 PM, Rich Felker wrote:
> > On Mon, May 06, 2019 at 12:42:44PM -0500, Rob Landley wrote:
> >> Huh... I'll assume REG_STARTEND works in bionic since you're pointing me at it.
> >> It's not in the regex man page but it _is_ in the glibc headers...
> >>
> >>   https://github.com/bminor/glibc/commit/6fefb4e0b16
> >>
> >> Looks like it went into glibc in 2004, which is way past 7 years. I should poke
> >> Michael Kerrisk to update the man page.
> >>
> >> And musl explicitly refused to do it, but Rich makes bad calls all the time:
> >>
> >>   https://www.openwall.com/lists/musl/2013/01/15/26
> >>
> >> I still haven't got an #ifdef __MUSL__ in portability because Rich insists his
> >> libc is the only perfect piece of software ever written, but I can probe for it
> > 
> > You can #ifdef REG_STARTEND and use it conditionally depending on
> > whether the functionality is offered. There is no reason to hard-code
> > an assumption that musl doesn't or does have this functionality;
> 
> I did (https://github.com/landley/toybox/commit/48162c4ee3fb) but it's still a
> musl-specific workaround. Glibc, uClibc, and bionic all adopted this about 15
> years ago, and it came from freebsd in the first place which presumably covers
> macosx (I can't currently check, but somebody would poke me before too long).

Regardless of your opinion on whether or why musl does or doesn't have
this extension, testing for it with #ifdef REG_STARTEND rather than
"testing for musl" is the right approach, because if/when it's added
to musl, you automatically get support for it. If you hard-coded an
assumption that musl doesn't have it, then users are stuck with that
even if it changes.

> Musl is the only thing left that _doesn't_ have this functonality, and that's
> because
  ~~~~~~~

I don't think it's particularly "because" of this. Rather, the pros
and cons of adding this extension have been under discussion for a
while, with nobody pushing for further action on the topic.

> you dismissed NUL matching as not a "meaningful use of sed", and called
> it "hideous hacks" and "nobody will notice the difference with it missing". (I
> implemented a regexec0() wrapper to provide NUL matching 3 years before this
> performance issue came up.)

Use on non-text data is outside the scope of the standard sed utility,
and I'm a bit hostile to insistence that the standard utilities be
useful with non-text because of what a disaster Austin Group issue 663
turned into. However this is really not a consideration in whether
REG_STARTEND is included going forward.

> > it's
> > been proposed and there's even a patch somewhere. (It's not costly to
> > support relative to the current bad regex implementation, but the
> > concern is that it might impose a nontrivial cost on a future good
> > implementation once we're locked into having it.)
> 
> We have different project philosophies. I wait for actual users to show up
> before implementing a lot of things, but when they do I don't tend to think I
> know better than they do about what their use cases should be.

If I accepted arbitrary requests for stuff in musl, it would just be a
clone of glibc. As a community we have developed pretty clear and
consistent criteria for what does or doesn't get included. I'm not
aware of any indication that REG_STARTEND shouldn't be included except
possible future runtime cost; it doesn't have any major complexity
cost or incompatibility between different historical implementations
AFAIK.

> $ echo -e 'hello\0blah' | ./sed 's/blah/boing/' | hd
> 00000000  68 65 6c 6c 6f 00 62 6f  69 6e 67 0a              |hello.boing.|
> 0000000c
> 
> The freebsd man page from last time
> (https://www.freebsd.org/cgi/man.cgi?query=regex&sektion=3&manpath=freebsd-release-ports)
> says that REG_PEND is explicitly for specifying the length of a string
> containing NULs (without moving the starting point like REG_STARTEND can), so
> the plumbing in between already has to support embedded NULs. The man page could
> be clearer, hopefully Michael Kerrisk's will be.
> 
> > whether it treats the start/end as *additional* constraints,
> 
> Not in the systems I've been able to test. (I need to get my freebsd test image
> set back up on the new machine...)
> 
> (And hey, congrats on musl not doing the strlen() before matching without this.
> You don't have the N^2 performance issue glibc and bionic did, and thus don't
> need that reason this patch went in. But I didn't keep the previous "implement
> match after NULL" code that worked without it just for musl because I don't care
> enough about working around musl's unique and intentional limitations anymore.
> It can stay broken for all I care. I'm not quite to the point of removing
> existing fixes like the sched_get_priority() syscall wrapping and the nommu
> config symbol, but only because bionic doesn't support nommu yet, and I'm not
> _quite_ convinced uClibc's been fully resurrected.)

Quite the opposite; it's basically dead again. :(

> > to work
> > with a substring of a necessarily null-terminated string, or whether
> > they replace the null-termination criterion. If we do adopt it we
> > should ensure we do whatever other implementations do here, especially
> > if that's important to use cases you or others want.
> 
> Entirely your call. You've refused an #ifdef __MUSL__ for a decade now so I
> can't put clean fixes in portability.h for this sort of thing, which can _only_
> be because your library is perfect.

Also quite the opposite. A __MUSL__ would only be meaningful if musl
were "perfect" in the sense that nothing would ever need to be added
or changed again. Since that's not the case, you need to probe the
functionality you want to conditionally use, not whether it's musl. As
I've said many times, EVERY SINGLE TIME someone has requested
__MUSL__, it's turned out that what they wanted to do with it would
have given the wrong result within less than a year of the request.
It's very likely this will be the case with REG_STARTEND too.

I know not everything is a macro that can be tested; both myself and
folks from other implementations are interested in developing an
agreed-upon way to report availability of other extensions via macros,
so that configure-style compile/link tests are not needed. This would
also facilitate reporting of some things that can't even properly be
tested because they would need runtime tests. Unfortunately efforts in
this area are stalled from lack of interest. If you or others would
like to raise a fuss about making it happen, I'd welcome it.

Rich



More information about the Toybox mailing list