[Toybox] pathological case in sed s///g

enh enh at google.com
Fri May 3 11:05:36 PDT 2019


BSD seems to avoid all the copying? i think they have a "source" and
"destination" and only write to the latter (solving your issue), but
only move forwards (so i don't think they try to maintain it as "what
the whole result would look like if we only did this many
replacements", rather "here's what the first n characters of the
result will look like").

On Fri, May 3, 2019 at 11:02 AM Rob Landley <rob at landley.net> wrote:
>
> On 5/3/19 12:40 PM, Rob Landley wrote:
> > On 5/2/19 9:46 PM, enh via Toybox wrote:
> >> i've known about this for a couple of days and haven't had time to
> >> look at it properly yet, so i should mention it here...
> >>
> >> if you have a file with a 1MiB line of 'x'es and you sed 's/x/y/g',
> >> BSD or GNU sed finishes immediately, but toybox takes forever.
> >
> > Because it allocates a replacement string and copies the before/replace/after
> > parts into it for each replacement. I should switch the xmalloc() on line 515 to
> > xrealloc() and fiddle with the surrounding code not to do unnecessary work...
>
> And luckily I left myself a comment about why I _didn't_ do that:
>
>   // Allocate new size, copy start/end around match. (Can't extend in
>   // place because backrefs may refer to text after it's overwritten.)
>
> I can realloc() if there were no backrefs, or I can make a temp copy of _just_
> the removed text (since all the backrefs have to be within match[0])...
>
> Hmmm... it's easy to record whether there were backrefs since the code right
> above there measures the new length including backrefs, so I can have it set a
> flag. The embedded newline support does NOT include matching a NUL (because that
> would involve rewriting the regex library) so there can't be a NUL _within_ the
> match so I can just conditionally xstrndup()...
>
> It's a bit fiddly, but not too bad so far...
>
> Rob



More information about the Toybox mailing list