[Toybox] grep and empty regexes

Rob Landley rob at landley.net
Tue Jul 30 04:47:55 PDT 2019


On 7/29/19 1:59 PM, enh wrote:
>> Do you have a catalog of what the differences _are_? (Empty regex and leading +
>> so far?)
> 
> no, i've got no idea. seems like there are bits and bobs in various
> GNU docs (about the tools themselves and libc), but i didn't find
> anything detailed enough to include either of these.
> https://www.regular-expressions.info/gnu.html is horrifically garish,
> but does seem to be a good condensation of the GNU info i found from
> other sources.

Sounds like "wait for people to complain, then whack-a-mole it" is a reasonable
option for the moment.

(You also have the ability to tweak bionic's regex plumbing, although the
workaround for _this_ issue is a 2 line fix, so...)

> would you like me to move the workaround into xregcomp instead? or do
> you want to wait until we see someone need this in sed or wherever
> first?
What would a sed test case look like?

If it doesn't break anything, moving it into xregcomp seems like the right thing
to do.

>> I'm glad you're giving the test suite some attention, but you're polishing
>> what's _there_ and my concerns are more about what _isn't_. Longer term I need
>> to A) write a gazillion more tests (based on a close re-reading of the spec, the
>> relevant man page or RFC, and the source code), B) get mkroot to the point I can
>> run more sheep across the minefield and catch issues with real world data.
> 
> oh, yeah, *coverage* is a huge blind spot for us at the moment, and
> something i want to look at. but first i wanted to get as many of the
> tests running in presubmit as possible, to maximize my
> pain^W^W^Wminimize the number of bugs that make it through to folks
> who're just trying to build AOSP/just trying to use a device. not
> every OEM is as keen on the idea of a reproduceable hermetic build as
> you might expect, so giving them fewer stones to throw seems like a
> good idea :-)

Indeed.

> i'm down to just blkid and du failures on a local taimen device now...

Is that building for the taimen device, or building on the taimen device?

> it's time to admit to myself i'm not likely to implement the ntfs
> LABEL support any time soon, and at least send you a patch that fixes
> all the other issues. (sent separately.)

Looking at the ntfs image in tests/files/blkid:

$ /sbin/blkid ntfs.img
ntfs.img: LABEL="myntfs" UUID="6EE1BF3808608585" TYPE="ntfs"
$ hd -s 0x4d80 -n 16 ntfs.img
00004d80  6d 00 79 00 6e 00 74 00  66 00 73 00 00 00 00 00  |m.y.n.t.f.s.....|

But it's repeated at 3ffd80 and there's a "Volume" before it that smells a
little like it's the second member of a linked list of structures? I only have
the _one_ NTFS file. I

(As a teenager, I reverse engineered a _lot_ of game save formats on the C64 and
DOS; not so much on the amiga because the system I had came with zero
development tools and was a read-only game machine except for the word
processor. The first nontrivial program I ever wrote was a commodore 64 disk
sector hex editor, and I lost the source to the first version when I used it on
its own disk and it had an off by one error that corrupted the root directory. I
was... 11?)

Do you have a lot of NTFS disk labels you need identified? (I.E. is this a use
case you'd actually use, or just a completeness thing?)

Sigh, lemme check https://en.wikipedia.org/wiki/NTFS to see what I'm supposed to
do. Read the boot sector, 8 bytes at 0x30 times 1 byte at 0x0D is the LBA sector
offset (presumably still 512 byte) of the start of the master file table, then
segment #3 in there is $VOLUME data which includes a $VOLUME_NAME record...

Looks like you have to chase some sort of tree structure to reliably find the
volume ID. If that's added to blkid it's as a special case function doing that,
it's not gonna fit in the table even conceptually.

> as for du versus the extra space used for extended attributes, i'm
> still not sure what to do about that...

My plan was running the tests in a known environment. I'm working on getting
mkroot to where I can make an ext2 image (or maybe vfat?), loopback mount it,
and run df against that so I get consistent results without a different
filesystem changing the results.

It should also run in a container, but half the time what you're debugging is
what's different about your system that's giving "different but not wrong"
results. Needs a reference implementation to regression test in so you can see
_what_ is different...

Rob



More information about the Toybox mailing list