[Toybox] sed

Sun Nov 10 17:13:55 PST 2019

On 11/10/19 6:56 PM, scsijon wrote:
> I know it's not exactly a toybox problem, but can someone please sort this out
> for me.
> 
> sed ':a;N;$!ba;s|>\s*<|><|g'

Could you please be a little more vague?

:a is a jump label
N means read the next line of input and append it to this one.
$! means match last line and then when it's NOT that do...
b a means branch (unconditional jump) back to label a

So that part so far is going to read the whole input into a single line. (But
with embedded \n so it's still the same data, but it's processing it as one thing.)

s is the normal s/// search an dreplaced, except it doesn't need to be / it can
be any character. In this case they're using | as the separators, so s|||

The "look for this" part is >\s*< and \s is a gnu/dammit regex extension meaning
"any run of whitespace" (space, tab, or newline). (It works with toybox in
glibc, no idea about other libc regex engines.) The portable way would be to say
[[:space:]] and the logic behind that is that it's a special range within [abc]
character matches, ala [abc[:space:]def]

Then the "replace it with this" part is >< and the third part is "g" which means
"global", I.E. "don't just do the first one, do all of them".

So, you're matching > followed by any amount of space (and * matches "zero or
more repeats of" so it'll also match no spaces), followed by <, and it replaces
them with >< (so when it matches no spaces it replaces it with itself.

It looks like this regex removes whitespace between HTML tags, including gluing
lines together to do so.

Rob