[Toybox] pathological case in sed s///g

enh enh at google.com
Mon May 6 22:57:23 PDT 2019


From: Rich Felker <dalias at libc.org>
Date: Mon, May 6, 2019 at 7:40 PM
To: Rob Landley
Cc: toybox

> On Mon, May 06, 2019 at 07:05:46PM -0500, Rob Landley wrote:
> > On 5/6/19 12:48 PM, Rich Felker wrote:
> > > On Mon, May 06, 2019 at 12:42:44PM -0500, Rob Landley wrote:
> > >> Huh... I'll assume REG_STARTEND works in bionic since you're pointing me at it.
> > >> It's not in the regex man page but it _is_ in the glibc headers...
> > >>
> > >>   https://github.com/bminor/glibc/commit/6fefb4e0b16
> > >>
> > >> Looks like it went into glibc in 2004, which is way past 7 years. I should poke
> > >> Michael Kerrisk to update the man page.
> > >>
> > >> And musl explicitly refused to do it, but Rich makes bad calls all the time:
> > >>
> > >>   https://www.openwall.com/lists/musl/2013/01/15/26
> > >>
> > >> I still haven't got an #ifdef __MUSL__ in portability because Rich insists his
> > >> libc is the only perfect piece of software ever written, but I can probe for it
> > >
> > > You can #ifdef REG_STARTEND and use it conditionally depending on
> > > whether the functionality is offered. There is no reason to hard-code
> > > an assumption that musl doesn't or does have this functionality;
> >
> > I did (https://github.com/landley/toybox/commit/48162c4ee3fb) but it's still a
> > musl-specific workaround. Glibc, uClibc, and bionic all adopted this about 15
> > years ago, and it came from freebsd in the first place which presumably covers
> > macosx (I can't currently check, but somebody would poke me before too long).
>
> Regardless of your opinion on whether or why musl does or doesn't have
> this extension, testing for it with #ifdef REG_STARTEND rather than
> "testing for musl" is the right approach, because if/when it's added
> to musl, you automatically get support for it. If you hard-coded an
> assumption that musl doesn't have it, then users are stuck with that
> even if it changes.
>
> > Musl is the only thing left that _doesn't_ have this functonality, and that's
> > because
>   ~~~~~~~
>
> I don't think it's particularly "because" of this. Rather, the pros
> and cons of adding this extension have been under discussion for a
> while, with nobody pushing for further action on the topic.
>
> > you dismissed NUL matching as not a "meaningful use of sed", and called
> > it "hideous hacks" and "nobody will notice the difference with it missing". (I
> > implemented a regexec0() wrapper to provide NUL matching 3 years before this
> > performance issue came up.)
>
> Use on non-text data is outside the scope of the standard sed utility,
> and I'm a bit hostile to insistence that the standard utilities be
> useful with non-text because of what a disaster Austin Group issue 663
> turned into. However this is really not a consideration in whether
> REG_STARTEND is included going forward.
>
> > > it's
> > > been proposed and there's even a patch somewhere. (It's not costly to
> > > support relative to the current bad regex implementation, but the
> > > concern is that it might impose a nontrivial cost on a future good
> > > implementation once we're locked into having it.)
> >
> > We have different project philosophies. I wait for actual users to show up
> > before implementing a lot of things, but when they do I don't tend to think I
> > know better than they do about what their use cases should be.
>
> If I accepted arbitrary requests for stuff in musl, it would just be a
> clone of glibc. As a community we have developed pretty clear and
> consistent criteria for what does or doesn't get included. I'm not
> aware of any indication that REG_STARTEND shouldn't be included except
> possible future runtime cost; it doesn't have any major complexity
> cost or incompatibility between different historical implementations
> AFAIK.
>
> > $ echo -e 'hello\0blah' | ./sed 's/blah/boing/' | hd
> > 00000000  68 65 6c 6c 6f 00 62 6f  69 6e 67 0a              |hello.boing.|
> > 0000000c
> >
> > The freebsd man page from last time
> > (https://www.freebsd.org/cgi/man.cgi?query=regex&sektion=3&manpath=freebsd-release-ports)
> > says that REG_PEND is explicitly for specifying the length of a string
> > containing NULs (without moving the starting point like REG_STARTEND can), so
> > the plumbing in between already has to support embedded NULs. The man page could
> > be clearer, hopefully Michael Kerrisk's will be.
> >
> > > whether it treats the start/end as *additional* constraints,
> >
> > Not in the systems I've been able to test. (I need to get my freebsd test image
> > set back up on the new machine...)
> >
> > (And hey, congrats on musl not doing the strlen() before matching without this.
> > You don't have the N^2 performance issue glibc and bionic did, and thus don't
> > need that reason this patch went in. But I didn't keep the previous "implement
> > match after NULL" code that worked without it just for musl because I don't care
> > enough about working around musl's unique and intentional limitations anymore.
> > It can stay broken for all I care. I'm not quite to the point of removing
> > existing fixes like the sched_get_priority() syscall wrapping and the nommu
> > config symbol, but only because bionic doesn't support nommu yet, and I'm not
> > _quite_ convinced uClibc's been fully resurrected.)
>
> Quite the opposite; it's basically dead again. :(
>
> > > to work
> > > with a substring of a necessarily null-terminated string, or whether
> > > they replace the null-termination criterion. If we do adopt it we
> > > should ensure we do whatever other implementations do here, especially
> > > if that's important to use cases you or others want.
> >
> > Entirely your call. You've refused an #ifdef __MUSL__ for a decade now so I
> > can't put clean fixes in portability.h for this sort of thing, which can _only_
> > be because your library is perfect.
>
> Also quite the opposite. A __MUSL__ would only be meaningful if musl
> were "perfect" in the sense that nothing would ever need to be added
> or changed again. Since that's not the case, you need to probe the
> functionality you want to conditionally use, not whether it's musl. As
> I've said many times, EVERY SINGLE TIME someone has requested
> __MUSL__, it's turned out that what they wanted to do with it would
> have given the wrong result within less than a year of the request.
> It's very likely this will be the case with REG_STARTEND too.
>
> I know not everything is a macro that can be tested; both myself and
> folks from other implementations are interested in developing an
> agreed-upon way to report availability of other extensions via macros,
> so that configure-style compile/link tests are not needed. This would
> also facilitate reporting of some things that can't even properly be
> tested because they would need runtime tests. Unfortunately efforts in
> this area are stalled from lack of interest. If you or others would
> like to raise a fuss about making it happen, I'd welcome it.

wouldn't it be better to have more stuff along the lines of clang's
__has_builtin/__has_feature/__has_include? they're already very useful
for getting rid of this kind of stuff, and a __has_function would
cover most of what's missing.

> Rich
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net



More information about the Toybox mailing list