[Toybox] [PATCH] xargs: avoid "Argument list too long".

Fri Oct 25 15:43:33 PDT 2019

haven't had time to look at why, but the currently checked in code is
failing the two new tests on Android too:

-- xargs
PASS: xargs xargs
PASS: xargs spaces
PASS: xargs -n 0
PASS: xargs -n 1
PASS: xargs -n 2
PASS: xargs -n exact match
PASS: xargs xargs2
PASS: xargs -s too long
PASS: xargs -s 13
PASS: xargs -s 12
PASS: xargs -s counts args
PASS: xargs -s 4 fails
PASS: xargs -s 5 no input okay
PASS: xargs -s 5 with input bad
PASS: xargs command -opt
PASS: xargs -0 -n1
PASS: xargs -0 -n2
PASS: xargs -t
PASS: xargs -E END
PASS: xargs -r
PASS: xargs no -r
PASS: xargs true
PASS: xargs command not found
PASS: xargs false
PASS: xargs exit 255
xargs: argument too long
FAIL: xargs max argument size
echo -ne '' | head -c 131071 /dev/zero | tr '\0' x | xargs | wc -c

--- expected 2019-10-25 18:05:50.712949392 -0400
+++ actual 2019-10-25 18:05:52.869616274 -0400
@@ -1 +1 @@
-131072
+0
xargs: argument too long
FAIL: xargs big input forces split
echo -ne '' | for i in $(seq 1 100);do echo $X;done|{ xargs >/dev/null
&& echo yes;}
--- expected 2019-10-25 18:05:55.426283195 -0400
+++ actual 2019-10-25 18:05:55.429616529 -0400
@@ -1 +0,0 @@
-yes


(you can't tell which "argument too long" message in xargs.c that is,
which is why i keep trying to ensure we have distinct error
messages... not to mention the fact that the lack of a capital letter
is a pretty subtle way to distinguish this from an actual execve(2)
failure -- which is "Argument too long" -- something that confused me
several times while investigating xargs issues until i fixed the
errors to all be different.)

On Fri, Oct 25, 2019 at 12:06 AM Rob Landley <rob at landley.net> wrote:
>
> On 10/24/19 7:53 PM, enh wrote:
> > i haven't tested my other cases, but your patch doesn't address the -0 issues:
>
> How many cases do you have?
>
> I hadn't noticed the second test _wasn't_ commented out, and there wasn't a
> change to the actual -0 accounting code. (Issues plural? Is there more than one
> problem with -0?)
>
> > /tmp/toybox$ ./toybox find / -print0 2>/dev/null | ./toybox xargs -0 > /dev/nul
> > l ; echo $?
> > xargs: exec echo: Argument list too long
> > 126
> > /tmp/toybox$
>
> Reproduced with /usr on devuan. (It invokes echo fine the first time, but the
> second goes boing.)
>
> bytes=2092116 limit=2092178 entries=38357 : success
> bytes=2092153 limit=2092178 entries=34423 : failure
>
> An accounting difference of 35 bytes between success and failure. And the
> failing one was significantly _fewer_ entries so it's not "we missed some
> per-entry resource that added up to kill us", it's _fewer_ entries.
>
> No idea what's going on there. The -0 case was _always_ adding the sizeof(char
> *) and NUL terminator. Hmmm, there should be an extra char * in both
> environ_bytes and argv (the NULL terminating entry each time) but that's a
> constant 16 overshoot into the 2048 fudge factor. I stuck a printf in and it
> _thinks_ it's calling exec about 25 bytes under budget in the case that fails?
> Hmmm... Why would this ONLY hit the -0 case and not the other one? By the time
> we exec() it shouldn't matter?
>
> I'm guessing your "fix" was this bit?
>
> -  // though obviously that's not really something you can guarantee.
> -  bytes = sysconf(_SC_ARG_MAX) - environ_bytes() - 2048;
> -  if (!TT.s || TT.s > bytes) TT.s = bytes;
> +  // though obviously that's not really something you can guarantee. In
> +  // practice on Linux we need to stay clear by a page (rounding?).
> +  TT.limit = sysconf(_SC_ARG_MAX) - environ_bytes() - sysconf(_SC_PAGESIZE);
>
> That really sounds like just throwing a larger fudge factor at it rather than
> understanding what's going on.
>
> The thing is, we're going ~17 bytes into the earlier fudge factor (there's a
> NULL pointer each at the end of argc[] and envp[] that aren't accounted for, and
> then we start the command size calculation with -1 to get the right size for
> "xargs -s 8" and such) and it seems to have worked?
>
> Your change implies we need to avoid this last page _entirely_, and the first
> invocation worked fine. We've _already_ got a fudge factor of 2048 bytes that's
> somehow not enough here, doubling it doesn't fill me with warm fuzzies. If I
> don't know why it's breaking, I don't have confidence we fixed it.
>
> Sigh, time to fire up QEMU and stick printk's into the kernel...
>
> Rob