[Toybox] ASan freaks out when using tsort in multicall binaries

Fri Oct 6 22:35:58 PDT 2023

------- Original Message -------
On Friday, October 6th, 2023 at 23:01, Rob Landley <rob at landley.net> wrote:


> On 10/6/23 16:33, Oliver Webb wrote:
>
> > > Hadn't seen that one, I'm aware of a sparse file issue on some filesystems.
> > > (That hit on microsoft github.)
> >
> > My home directory is ecryptfs,
>
>
> Yup, that would explain it.
>
> > Testing on my /tmp directory (etx4) makes the errors
> > go away for both du and tar.
>
>
> config ECRYPT_FS
> tristate "eCrypt filesystem layer support"
> depends on KEYS && CRYPTO && (ENCRYPTED_KEYS || ENCRYPTED_KEYS=n)
> select CRYPTO_ECB
> select CRYPTO_CBC
> select CRYPTO_MD5
> help
> Encrypted filesystem that operates on the VFS layer. See
> <file:Documentation/filesystems/ecryptfs.rst> to learn more about
>
> eCryptfs. Userspace components are required and can be
> obtained from http://ecryptfs.sf.net.
>
>
> Sourceforge. Lovely. And that website redirects to a page that lists a google+
> page, and says the ecryptfs-utils source is in launchpad/bazaar. No obvious way
> to get a tarball, but I can create a snap pack from the web page? Last commit to
> https://bazaar.launchpad.net/~ecryptfs/ecryptfs/trunk/files says it was 6 years ago.
>
> To quote the whale, "I'm quite dizzy with anticipation. Or is it the wind?"
>
> > The specific test that fails with tar is "tar create long->long".
>
>
> Is it the "touch" that fails, or tar? Because the test is doing:

Putting a ' || echo "whatever" ' after the touch command that creates the long filenames shows nothing, 
these are the logs from the test failure:

ln: failed to create symbolic link 'dir/[250 CHAR FILENAME (starts with 6)]' -> '[255 CHAR FILENAME (starts with 1)]': File name too long
tar: dir/[250 CHARACTER FILENAME (starts with 6)]: File name too long
tar: had errors
FAIL: tar create long->long
echo -ne '' | tar c --owner root --group sys --mtime @1234567890 dir/[250 CHAR FILENAME (starts with 6)] | SUM 7

--- expected    2023-10-07 05:17:52.480498065 +0000
+++ actual      2023-10-07 05:17:52.484498107 +0000
@@ -1 +1 @@
-b9e24f53e27496c5125445230d201b4a36ff7398
+60cacbf3d72e1e7834203da608037b1bf83b40e8

> # 255 bytes, longest VFS name
> LONG=0123456789abcdef0123456789abcdef
> LONG=$LONG$LONG$LONG$LONG$LONG$LONG$LONG$LONG
> LONG=${LONG:1:255}
>
> # 4+96=100 (biggest short name), 4+97=101 (shortest long name)
> touch dir/${LONG:1:96} dir/${LONG:1:97}
> testing "create long fname" "$TAR dir/${LONG:1:97} dir/${LONG:1:96} | SUM 3" \
> "d70018505fa5df19ae73498cfc74d0281601e42e\n" "" ""
>
> And what I was trying to test was the border condition of the tar internals
> where it switches over to an adjunct record to record an overlength field that
> won't fit in the structure, and it sounds like what's failing is the
> filesystem's ability to have two adjacent directories of length 96 and 97 that
> differ only by that final character. Except I didn't add a check for failure to
> the "touch" because it wasn't supposed to be part of the test, I just naively
> assumed that would portably work...
>
> > Oh, another one I forgot to mention is "truncate sparse" fails on ecryptfs as well, but works on ext4
>
>
> Do you mean the tests/truncate.test entry:
>
> testing "is sparse" "truncate -s 1g freep && [ $(stat -c %b freep) -le 8 ] &&
> echo okay" "okay\n" "" ""
>
> Which is doing a "truncate -s 1g file" and then asking state if the file with
> literally no contents used less than 8 512-byte blocks of storage?

That's the one

> The test is trying to ask "did the command create a sparse file", and the
> failure seems to be "the filesystem cannot store a file sparsely", or at least
> takes more than 4k to store literally no data.

Yeah, The boilerplate is 8K for a empty file,
even one byte of data incurs a extra 4K to be added.

> I did not predict that failure mode from a filesystem merged into the mainline
> kernel.
>
> > > > sed fails the performance test even though it can process a megabyte of data in less then 20s,
> > >
> > > On what hardware?
> >
> > A laptop with 4GB of RAM and about 2.5 Gigahertz of processing power with 2 cores (Intel Celeron).
> > This doesn't seem like a hardware speed issue
> >
> > (seq 160000 generates about a MB of data so I used that in this test instead of the 20 doublings sed.test does)
> > $ time ( seq 160000 | toybox sed "s/./y/g" > /dev/null )
> >
> > real 0m0.282s
> > user 0m0.235s
> > sys 0m0.049s
>
>
> It's not generating a megabyte of random data, it's generating a megabyte of the
> same character, and then asking sed search-and-replace replace one byte at a
> time in that string a million times. The search-and-replace is s/x/y/ meaning
> each x gets replaced with y. The output of "seq" does not contain any "x"
> characters, so the search and replace will trigger zero times instead of
> triggering a million times.
>
> If you want more efficient generation of the test string I could instead do:
>
> dd if=/dev/zero bs=65536 count=16 | tr '\0' x
>
> For a definition of "efficient" that calls an external program to marshall the
> same amount of data through multiple kernel pipe buffers rather than staying
> process-local and just thrashing the heap a bit. (My ten year old laptop has 3
> megs of L2 cache so it probably all stays in cache, the three-process monty
> version is gonna do page table shenanigans across four different contexts, uses
> more SMP but quite possibly bounces data out to DRAM? Dunno.)
>
> Optimization is often non-obvious these days. I mostly try to do the simple
> thing and stay out of the way of whatever clever stuff other people did, and
> then fix it if the result is obvious unpleasant.
>
> > > I haven't seen this (what distro/compiler/libc/filesystem are you testing on),
> >
> > Linux Mint 21.1 (Which is essentially Ubuntu 22.04 with some irrelevant changes)/GCC 11.4.0
> > /glibc 2.35/ecryptfs and etx4 (Both experience the same mkpasswd errors)
>
>
> Sounds like I need to install mint in KVM with ecryptfs... Oh hey, they've got
> an xfce version. Shouldn't be too hard to navigate...
>
> > Huh, I just tested with make test_mkpasswd and it worked, Another one like tsort where it triggers
> > ASAN only when in a multicall binary.
>
>
> Is it triggering ASAN in mkpasswd or in a different toybox command out of the
> $PATH? (Yay reproduction sequence, but WHAT did it reproduce?)

Putting toybox commands into my PATH runs test_mkpasswd fine, So I don't think it's some other toybox command.
And since the lib/password.c plumbing got rewritten and needs to be audited that naturally seemed
like the issue.

> > > but I mentioned I just redid the lib/password.c plumbing and need to re-audit
> > > that list of commands before next release.
> >
> > Here's the error message ASAN sends:
> >
> > AddressSanitizer:DEADLYSIGNAL
> > =================================================================
> > ==15453==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0x7ffd90ffb280 sp 0x7ffd90ffb1a8 T0)
> > ==15453==Hint: pc points to the zero page.
> > ==15453==The signal is caused by a READ memory access.
> > ==15453==Hint: address points to the zero page.
> > #0 0x0 (<unknown module>)
> >
> > AddressSanitizer can not provide additional info.
> > SUMMARY: AddressSanitizer: SEGV (<unknown module>)
> > ==15453==ABORTING
> >
> > (ASAN catching reading from a null pointer and "SEGV"-ing is different from the kernel catching one
> > and sending a SIGSEGV for a reason I don't know)
>
>
> The program_counter was zero. It called a null function pointer. And did not
> give a stack trace. And that doesn't say what executable did the dumping, or any
> context that would say what test it was trying to run

>From what I can tell all of the mkpasswd tests return that message

> that might let me go look
> at the script to guess.
>
> > > > To my surprise, Every test from tsort failed, along with some messages from
> > > > a "AddressSanitizer".
> > >
> > > Sigh, I moved the initializations between the two nested loops and the local
> > > variable declarations enough times I apparently dropped the plen initialization.
> > >
> > > Try commit 47946f241a4e.
> >
> > Works perfectly, thanks
>
>
> Yay.
>
> One down, a half-dozen to go sounds like...
>
> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net


- Oliver Webb <aquahobbyist at proton.me>