[Toybox] sed {N;N;} cmd test failure

Rob Landley rob at landley.net
Wed Dec 3 21:06:32 PST 2014


On 12/03/14 01:28, Ashwini Sharma wrote:
> Hi Rob,
> 
> While testing thru busybox sed tests, it failed at this test case.
> 
> # first three lines are deleted; 4th line is matched and printed by
> "2,3" and by "4" ranges
> testing "sed with N skipping lines past ranges on next cmds" \
>   "sed -n '1{N;N;d};1p;2,3p;3p;4p'" \                                  
>                                                                        
>                                 
>   "4\n4\n" "" "1\n2\n3\n4\n"
> 
> My observation is with single N, this is working fine, but with multiple N
> it doesn't give the same results.
> 
> Is the toybox behavior right or busybox/GNU's?

I believe the toybox behavior is right?

We can simplify the above test by removing the p; entries that don't add
a line to the output:

  echo -ne '1\n2\n3\n4\n' | sed -n '1{N;N;d};2,3p;4p'

Toybox just prints one 4, and gnu prints two of them. The difference is
that "2,3p" is triggering on line 4 in gnu, and isn't in toybox.

The 1{N;N;d} part should start with the first line, eat the second and
third lines with N commands, and then d (deleting the pattern space,
starting the next cycle) takes out all three lines before we get out of
the curly brackets.

The second time through, we're on the _fourth_ line (because the first
pass ate two extra lines and discarded them). This time through the 1{}
block is skipped, and then we get into the number ranges.

The range matching logic is confused. The 2,3 range thinks we
advanced into it (after the d) and activated it on line 3, and then the
"show at least one line out of an activated range" part shows line 4,
even though it's after the end of the range.

It's a combination of "3,2p should show line 3" and "3d;3,4p should show
line 4 even though the range activating line got deleted before the
command with the range attached could be evaluated." They're interacting
in gnu to make a range trigger on a line that's not _in_ the range, and
busybox copied that behavior.

I think that's a bug?

Posix says:

  An editing command with two addresses shall select the inclusive
  range from the first pattern space that matches the first address
  through the next pattern space that matches the second.

Simple enough.

  (If the
  second address is a number less than or equal to the line number
  first selected, only one line shall be selected.)

Not the case here, the second address is greater than the first.

  Starting at the
  first line following the selected range, sed shall look again for the
  first address.

Only applies to regexes.

  Thereafter, the process shall be repeated. Omitting
  either or both of the address components in the following form
  produces undefined results:

I don't see anything in here that says deleting the entire address range
causes the command with that range to trigger on the line after the
range, but I'll cc the posix list to see if anybody there has a better
understanding...

Rob

 1417669592.0


More information about the Toybox mailing list