[Toybox] pathological case in sed s///g

Rob Landley rob at landley.net
Fri May 3 11:03:17 PDT 2019


On 5/3/19 12:40 PM, Rob Landley wrote:
> On 5/2/19 9:46 PM, enh via Toybox wrote:
>> i've known about this for a couple of days and haven't had time to
>> look at it properly yet, so i should mention it here...
>>
>> if you have a file with a 1MiB line of 'x'es and you sed 's/x/y/g',
>> BSD or GNU sed finishes immediately, but toybox takes forever.
> 
> Because it allocates a replacement string and copies the before/replace/after
> parts into it for each replacement. I should switch the xmalloc() on line 515 to
> xrealloc() and fiddle with the surrounding code not to do unnecessary work...

And luckily I left myself a comment about why I _didn't_ do that:

  // Allocate new size, copy start/end around match. (Can't extend in
  // place because backrefs may refer to text after it's overwritten.)

I can realloc() if there were no backrefs, or I can make a temp copy of _just_
the removed text (since all the backrefs have to be within match[0])...

Hmmm... it's easy to record whether there were backrefs since the code right
above there measures the new length including backrefs, so I can have it set a
flag. The embedded newline support does NOT include matching a NUL (because that
would involve rewriting the regex library) so there can't be a NUL _within_ the
match so I can just conditionally xstrndup()...

It's a bit fiddly, but not too bad so far...

Rob



More information about the Toybox mailing list