[Aboriginal] uClibc-0.9.33.2 statfs() does not populate the `f_frsize' field of `struct statfs'
Rob Landley
rob at landley.net
Fri Dec 28 16:45:11 PST 2012
On 12/27/2012 10:52:13 PM, Rajeev V. Pillai wrote:
> > Rob Landley <rob at landley.net> on Friday, December 28, 2012 9:37 AM
> wrote:
> >
>
> > Are there any that implement it? Linus said he wanted to see
> fragments go away in 1995:
>
> I'm pretty certain that Novell NetWare 4.x natively does it for its
> filesystem.
> Not sure if Linux's NWFS implementation handles those fragments,
> though.
> Also not sure if FSes which do block suballocation (Btrfs, Reiser4,
> UFS2)
> handle the sub-blocks as fragments and report them as such in
> `f_frsize'.
> It seems logical that they should.
I know that reiserfs didn't. And I'm pretty sure btrfs doesn't.
The thing is, things like tail packing perform variable sized
sub-allocations, so reporting a single "fragment size" number for the
whole filesystem is meaningless in that context. (And really, tail
packing is a special case.)
That fragment field dates back to attempting to have a block size
smaller than actual physical transactions were done in, and Linus
basically pointed out that the smaller value is the real block size and
the larger value is totally artificial so attempting to maintain
multiple levels is sad. (The block layer can sort and merge outstanding
requests, that's the "I/O elevator" code. Trying to do this is not a
filesystem's job.)
I.E. _WHEN_ you do block suballocation, the granularity in which you do
so is bytes. So the fragment size would always be "1", which is useless.
I.E. this really did get discarded 17 years ago and nobody's
resurrected it since, because it was a bad idea.
> And, given the push towards larger block sizes, more FSes will start
> to implement
> something like fragments.
No, they won't. You're acting like this is a new thing instead of a
topic of discussion for many years now:
https://lwn.net/Articles/250335/
https://lwn.net/Articles/349970/
Again because a _single_ fragment size is nonsensical, what you want
are variable sized chunks. And what you can do to get them is demand
that 4096 block ranges be contiguous and then store a count of the
number of them you've used, which is an optimization ext2 has been
using from day 1 and BSD used before that.
If your argument is "we must be able to subdivide filesystem blocks",
people do so via byte ranges. (They jump from granularity 4096 to
granularity 1.) When it's "we must use larger transactions than
filesystem blocks", people group blocks but continue to track the
allocation at either block size or byte size.
Some media naturally use larger transaction sizes than the filesystem
block size, but the fix for that is to make the journaling layer aware
of it so it can group the commits. This isn't just a block size issue,
it's an alignment issue. When disks started increasing block sizes to
_match_ the block sizes filesystems had been using for years, there was
a problem that the 512 byte "boot sector" put things out of alignment,
and we had to update partitioning programs to create partitions that
started at the right offset:
https://lwn.net/Articles/322777/
Note that making filesystem block sizes larger than the memory page
size didn't happen. And even though we've got _terabytes_ of memory on
some of the larger systems, the default RAM allocation size is staying
4k.
Yes there are hugepages to conserve TLB entries, and you can format an
ext4 filesystem with huge blocks so it doesn't spend forever parsing
allocation tables:
https://lwn.net/Articles/469821/
But note that when they do that, they don't sub-allocate within
hugepages or huge blocks, because doing so DEFEATS THE PURPOSE OF
HAVING THEM. This isn't a "fragment size" because you don't fragment
them. Subdividing them is the application's problem, not the OS's.
Rob
More information about the Aboriginal
mailing list