[Toybox] pathological case in sed s///g

Rob Landley rob at landley.net
Fri May 3 11:56:32 PDT 2019


On 5/3/19 1:05 PM, enh wrote:
> BSD seems to avoid all the copying? i think they have a "source" and
> "destination" and only write to the latter (solving your issue), but
> only move forwards (so i don't think they try to maintain it as "what
> the whole result would look like if we only did this many
> replacements", rather "here's what the first n characters of the
> result will look like").

Oh I know it's solvable. Working on it.

The s/x/y/g case A) isn't changing the length of the string, B) has no backrefs.
So it doesn't need the copying, yes.

Mine only moves forwards too: line and len are the start of the line and the
whole line's length, but rline and rlen are the remaining line and remaining
length. I just responded to the "don't overwrite backref data while still using
it" problem by copying the whole line, when I didn't need to and it opens this
pathological case. Hence last email on what I _should_ do instead.

(Hmmm... the "one input buffer one output buffer" thing optimizes for the
"replace with shorter string" case where you avoid repeated memmove() on the
tail of the string. But it slightly pessimizes the cases where you _don't_ and
realloc() a lot. It's also a bigger change to the code, lemme see how much this
does first...)

But yeah, the new pessimal case after the change I'm making now would be a
megabyte of xyxyxyxy with 's/xy/x/g' _THAT_ would need the one output buffer
thing...

Rob



More information about the Toybox mailing list