[Toybox] [New Toys] - fstype, blkid

Rob Landley rob at landley.net
Tue Oct 8 10:20:52 PDT 2013


Catching up by <strike>burning the candle</strike> reading the email at  
both ends...

On 10/07/2013 06:06:47 AM, Conroy, Bradley Quentin wrote:
> I finally figured out the NTFS labels after reading a rant on how  
> UTF-8 rocks
> and how MS switched to UTF16 or UCS1 or whatever.

I read that article. (It's a small twitter stream... :)

> The reason I couldn't grep for the label (mine was "myntfs") was
> that it is stored as "m\0y\0n\0t\0f\0s\0\0" - found another good
> use for hexdump :)

I should add it to toybox. And make -C mode the default (ala diff -u).  
And make it share code with hexedit and possibly od.

(My first todo item in that area is figuring out why od gets the  
indentation wrong.)

> Notes:
> I only have x86 to test on,

Allow me to introduce you to aboriginal linux system images. Go to:

   http://landley.net/aboriginal/bin

Download a system-image of your choice (mips and powerpc are big  
endian), extract the tarball, run "./dev-environment.sh", and at the  
shell prompt wget source and compile it.

(Note that mips networking is broken with qemu 1.6, you'd need to use  
qemu 1.5 for that. Should work on powerpc though.)

More random documation-like stuff at  
http://landley.net/aboriginal/about.html

> so there are a couple of places that may need bswap_{16,32} for  
> endianness.

My limiting factor here is actually lack of test filesystem images.

> I used a 65k buf instead of toybuf (4k) for simplicity, but tried to  
> organize
> it for toybuf if wanted.

Half the file is #defines, and then the first line of actual C code is  
a typedef. There may be some more extensive modifications coming than  
that.

Ok, convert tabs to two spaces and check that in.

Oh wow. You're making me pull out my tab conversion sed. Haven't used  
that in a while...

Ok, yank the typedef. Make function definitions match K&R like  
everybody else for the past 30 years, ala:

   type function(args)
   {
   }

(Yes, we don't do that anywhere else but that's because this is  
creating a new function and anywhere else isn't.)

You don't need #if CFG_BLKID because blkid.c only gets compiled if  
CFG_BLKID is enabled. (If the name of a *.c file under toys/ matches  
the name of a config symbol, the C file's inclusion is controlled by  
that config symbol.)

You have an if() statement at the left edge, not indented at all within  
its function, and then the function ends with:

}else /* fstype */
   write(1,fstype,strlen(fstype));       /* avoid printf overhead in  
fstype */
   putchar('\n');
}

And the _reason_ that works is there's no curly bracket on the else so  
the write() belongs to the else but the putchar doesn't. Otherwise the  
function wouldn't end. Ouch.

The way to make an alias for a command is the OLDTOY() macro.

If you feed loopfiles() zero arguments, it reads from stdin. So calling  
blkid with no arguments hangs awaiting user input instead of printing  
its usage message. (Probably you don't want NULL optstring, you want  
"<1", at least for the moment.)

Let's see, what have I got lying around:

   $ ./toybox blkid ~/qemu/images/tccboot.iso
   $ ./toybox blkid ~/qemu/images/rh9.img
   $

iso9660 it doesn't know but ext2 it _should_. Oh, duh, that one's a  
partitioned image, and it doesn't recognize the partition table. Let's  
see...

   $ ./toybox blkid ~/system-image-armv5l/hda.sqf
   $

Squashfs? Hello?

Sigh. What did I break? Check the previous version... that didn't work  
either, and all I did to that was delete the fstype at the end that was  
breaking the build. Ah, maybe the "type punned pointer" warnings  
actually matter with this compiler version? Lemme build for i686...  
Nope, _still_ not identifying squashfs.

By the way, in terms of your 64k buffer (66k buffer, actually): no sane  
filesystem is going to have its identifying info straddle 4k blocks, so  
we should be able to read 4k chunks and iterate over the list for  
offsets in range. (This even avoids lseek, although I'm not sure why  
that would be an issue...)

Right, continuing to clean this up until I can make it work. What the  
HECK is this nest of MATCH macros calling each other for? (That's where  
the type punned pointer warnings come from, anyway...) Ah, it's only  
used for ext2/3/4. Because treating ext2, ext3, and ext4 as three  
separate filesystems just wouldn't do.

You don't need to strcmp toys.which->name with "blkid", you can just  
compare the first character to 'b'. (There are only two options...)

Alright, let's turn this giant stack of #defines and if/else staircase  
into a table with a loop iterating over it. Lets make the magic a  
uint64_t so we're not ignoring the second half of the btrfs magic  
you've got listed there, and let's just use the hex numbers like the  
kernel does, ala:

fs/btrfs/ctree.h:#define BTRFS_MAGIC 0x4D5F53665248425FULL /* ascii  
_BHRfS_M, no null */

Hmmm, you have a CRAMFS_MAGIC2 but your code doesn't seem to be using  
it. (The if is using a MATCH() macro instead of MATCH2().) Ah, the  
kernel header says that's the same number at the other endianness.

If JFS isn't even in /usr/linux/include/magic.h is it really an  
important filesystem to autodetect?

For NTFS, you have 8 as the label length (well, -8) but toutf8 fills  
out a 16 byte buffer? (And it doesn't actually have a length, it just  
keeps going until it hits a null terminator which there's no guarantee  
the file will have...)

Also, the NTFS label isn't _really_ alternating ascii and NUL bytes.  
It's horrible 16 bit wide character stuff that involves "codepages" and  
actually displaying labels from japan or korea just isn't going to work  
here. (Doing full windows internationalization isn't an option either.  
The question is, does the special case for ascii make sense or should  
we just not support labels here at all? I'm balancing "2/3 of the  
planet does not speak english" with "does android care about legacy  
windows crap that's this generation's version of punched cards?" Eh, I  
guess "windows was english only, the future is UTF8" is a reasonable  
compromise...)

However, add to that the fact that ntfs is the only filesystem that has  
a label in a different 4k block than the ID info, and special casing  
this really sounds like more trouble than it's worth. Are there a lot  
of thumb drives formatted NTFS out in the wild? (I'll add code to deal  
with a real world problem, my question is whether this is a real world  
problem? No idea.)

Also... ntfs has an 8 bit uuid? What? (It's the only one that does...)

Hang on, this thing doesn't identify vfat? (Which most external USB  
devices are formatted with?) Hmmm, I know microsoft's documentation  
says not to use the "FAT16" and "FAT32" strings for filesystem  
identification, but I don't care.

Ok, printing out the uuid there's three different possible bit-patterns  
for where the "-" go, one for 16 (the default), one for 4 (vfat), and  
one for 8 (ntfs, no dashes). I think rather than having a separate uuid  
length field that's usually 16 I'll encode the non-16 values in the top  
few bits of the offset, since I've got an int. (Offset already won't  
fit in a short.)

Hmmm, in testing FAT's uuid bytes are presented in reverse order from  
the tool ubuntu's using. But ext2 isn't...

Need test images. Lots and lots of test images...

> I have info on more fs types, to patch with after review.

I don't know what fs types count as "interesting". You have BFS which  
isn't in /usr/include/linux/magic.h, but don't have fat16 or fat32.

> blkid does output for all devices if 0 args -> read /proc/partitions?

Possibly. (You can run the other one under strace to see what it's  
doing.)

Rob
 1381252852.0


More information about the Toybox mailing list