[Toybox] pathological case in sed s///g

Rob Landley rob at landley.net
Mon May 6 17:05:46 PDT 2019


On 5/6/19 12:48 PM, Rich Felker wrote:
> On Mon, May 06, 2019 at 12:42:44PM -0500, Rob Landley wrote:
>> Huh... I'll assume REG_STARTEND works in bionic since you're pointing me at it.
>> It's not in the regex man page but it _is_ in the glibc headers...
>>
>>   https://github.com/bminor/glibc/commit/6fefb4e0b16
>>
>> Looks like it went into glibc in 2004, which is way past 7 years. I should poke
>> Michael Kerrisk to update the man page.
>>
>> And musl explicitly refused to do it, but Rich makes bad calls all the time:
>>
>>   https://www.openwall.com/lists/musl/2013/01/15/26
>>
>> I still haven't got an #ifdef __MUSL__ in portability because Rich insists his
>> libc is the only perfect piece of software ever written, but I can probe for it
> 
> You can #ifdef REG_STARTEND and use it conditionally depending on
> whether the functionality is offered. There is no reason to hard-code
> an assumption that musl doesn't or does have this functionality;

I did (https://github.com/landley/toybox/commit/48162c4ee3fb) but it's still a
musl-specific workaround. Glibc, uClibc, and bionic all adopted this about 15
years ago, and it came from freebsd in the first place which presumably covers
macosx (I can't currently check, but somebody would poke me before too long).

Musl is the only thing left that _doesn't_ have this functonality, and that's
because you dismissed NUL matching as not a "meaningful use of sed", and called
it "hideous hacks" and "nobody will notice the difference with it missing". (I
implemented a regexec0() wrapper to provide NUL matching 3 years before this
performance issue came up.)

> it's
> been proposed and there's even a patch somewhere. (It's not costly to
> support relative to the current bad regex implementation, but the
> concern is that it might impose a nontrivial cost on a future good
> implementation once we're locked into having it.)

We have different project philosophies. I wait for actual users to show up
before implementing a lot of things, but when they do I don't tend to think I
know better than they do about what their use cases should be.

I'll push back a bit or negotiate to make sure they're serious and this is
important to them and that I understand what they actually need. Sometimes I ask
a second user to speak up to make sure the first isn't an outlier. and I'll
often offer alternatives before doing what they asked to see if an easier thing
for me to do will meet their needs (and then wait for somebody else to show up
needing the full thing)...

But I implement what the users need my software to _do_ because that's what this
software is _for_. I do NOT tell my users that they're wrong because I disagree
with what they're trying to run. I grumble about it a lot while doing it, but I
don't pretend I know what my users should be doing better than they do. My
bright line boundary is "this command is out of scope, it should not be in
toybox at all, use a different implementation". Because if they say "this
command you have needs to do this" and mine can't do it, using a different
implementation is what they'll do anyway. When it comes to the privilege of
being the first command in the $PATH with that name, there can be only one.

(There is no _way_ I wouldn't have ripped out tar --to-command during the
cleanup otherwise. But it's part of somebody's use case, existing scripts, oh
well. Lots of extra work to get it to work with nommu, but I got there
eventually...)

I've told you a bunch of times that you're wrong about nommu support. The
founder of uclinux has told you a bunch of times you're wrong about nommu
support. But you insist that you know better than your users, don't even offer a
--nommu-probeable build time option, and that everyone else must rewrite their
software to be worthy to use your library.

I cc'd you on this issue because it seemed impolite to talk about it behind your
back, but I stopped trying to argue with you about this sort of thing after
https://github.com/landley/toybox/commit/833fb23fe8b4 because there's no point.

>> If this _does_ match NUL bytes it lets me remove regexec0() entirely from libc,
>> which would be very nice. And since it started life as a BSD extension (they're
>> the only man page _documenting_ it) I get support on FreeBSD and probably MacOS too.
> 
> I'm not sure if the proposed patch supports matching NUL or not (i.e.

It's not proposed, it's committed, and it worked when I tested it with glibc in
devuan's toolchain:

$ echo -e "hello\0there" | toybox sed 's/there/blah/'
helloblah

And with bionic in the Android NDK:

$ echo -e 'hello\0blah' | ./sed 's/blah/boing/' | hd
00000000  68 65 6c 6c 6f 00 62 6f  69 6e 67 0a              |hello.boing.|
0000000c

The freebsd man page from last time
(https://www.freebsd.org/cgi/man.cgi?query=regex&sektion=3&manpath=freebsd-release-ports)
says that REG_PEND is explicitly for specifying the length of a string
containing NULs (without moving the starting point like REG_STARTEND can), so
the plumbing in between already has to support embedded NULs. The man page could
be clearer, hopefully Michael Kerrisk's will be.

> whether it treats the start/end as *additional* constraints,

Not in the systems I've been able to test. (I need to get my freebsd test image
set back up on the new machine...)

(And hey, congrats on musl not doing the strlen() before matching without this.
You don't have the N^2 performance issue glibc and bionic did, and thus don't
need that reason this patch went in. But I didn't keep the previous "implement
match after NULL" code that worked without it just for musl because I don't care
enough about working around musl's unique and intentional limitations anymore.
It can stay broken for all I care. I'm not quite to the point of removing
existing fixes like the sched_get_priority() syscall wrapping and the nommu
config symbol, but only because bionic doesn't support nommu yet, and I'm not
_quite_ convinced uClibc's been fully resurrected.)

> to work
> with a substring of a necessarily null-terminated string, or whether
> they replace the null-termination criterion. If we do adopt it we
> should ensure we do whatever other implementations do here, especially
> if that's important to use cases you or others want.

Entirely your call. You've refused an #ifdef __MUSL__ for a decade now so I
can't put clean fixes in portability.h for this sort of thing, which can _only_
be because your library is perfect. Meanwhile I just added
https://github.com/landley/toybox/commit/35bf5932 because I know my stuff isn't.

> Rich

Rob


More information about the Toybox mailing list