[Toybox] pathological case in sed s///g
Rob Landley
rob at landley.net
Fri May 3 11:03:17 PDT 2019
On 5/3/19 12:40 PM, Rob Landley wrote:
> On 5/2/19 9:46 PM, enh via Toybox wrote:
>> i've known about this for a couple of days and haven't had time to
>> look at it properly yet, so i should mention it here...
>>
>> if you have a file with a 1MiB line of 'x'es and you sed 's/x/y/g',
>> BSD or GNU sed finishes immediately, but toybox takes forever.
>
> Because it allocates a replacement string and copies the before/replace/after
> parts into it for each replacement. I should switch the xmalloc() on line 515 to
> xrealloc() and fiddle with the surrounding code not to do unnecessary work...
And luckily I left myself a comment about why I _didn't_ do that:
// Allocate new size, copy start/end around match. (Can't extend in
// place because backrefs may refer to text after it's overwritten.)
I can realloc() if there were no backrefs, or I can make a temp copy of _just_
the removed text (since all the backrefs have to be within match[0])...
Hmmm... it's easy to record whether there were backrefs since the code right
above there measures the new length including backrefs, so I can have it set a
flag. The embedded newline support does NOT include matching a NUL (because that
would involve rewriting the regex library) so there can't be a NUL _within_ the
match so I can just conditionally xstrndup()...
It's a bit fiddly, but not too bad so far...
Rob
More information about the Toybox
mailing list