[Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.

Samuel Holland samuel at sholland.net
Fri Sep 4 23:04:26 PDT 2015


On 2015-09-04 18:24, Rob Landley wrote:
> Why is the _caller_ not appending B when they printf() the result? The
> space is before the units but the B isn't, and this is a string that
> gets put into a buffer and then used by something else. Further editing
> is kinda _normal_...

Because the caller would then have to worry about the M/MB/MiB problem. 
The convention (at least in GNU and util-linux) is that M and MiB both 
refer to 2^20 bytes, and MB refers to 10^6 bytes. If the caller appends 
the B afterward, it might change the meaning of the number:
	10Mi -> 10MiB is fine
	10M  -> 10MB is wrong

The purpose of the flag is to append B if the number is less than 
1000/1024, so (among other reasons) you can have a fixed-with string of 
output: 42G, 42M, 42K, 42B, even if there would not normally be a letter 
there. In that case, at least, you definitely don't want to "just append 
a B", because you only want the B in certain cases.

>>> HN_DIVISOR_1000 Divide number with 1000 instead of 1024.
>>
>> Yep, I think network speeds are measured in SI units for example
>> I could live with 1024 units everywhere esp. if we also used the IEC prefixes
>
> I object to the word "kibibyte" on general principles, and disks are
> also sold in decimal sizes (for historical marketing reasons).

But RAM is sold in binary sizes. "16 gigs" of RAM is 16384MiB, not 
16000MB. (Think `free -h`.) And on a more fundamental level, it will 
always be measured in binary sizes: pages are 4096 bytes, not 4000.

And so is flash, manufactured in binary. Even though you can buy a 
"500GB" SSD, it's really 512GiB on the inside, with the additional space 
used as spare flash pages.

> (Of course "512 gigs" is mixing decimal and binary when you _do_ use
> binary gigs, since the 512 is decimal and all. But let's be honest,
> "kibibytes" is a stupid name, all else is details for me.)
>
>>> HN_IEC_PREFIXES Use the IEE/IEC notion of prefixes (Ki, Mi,
>
> Mebibytes. *shudder*
>
> Huh, I thought the i was the second character in "binary", but this
> implies it's "IEC"? Or possibly IEE? Or maybe the i from "mebi" which is
> back to "binary" again...

Mi -> Mebi -> million binary -> 2^20

>>> Gi...). This flag has no effect when
>>> HN_DIVISOR_1000 is also specified.
>>
>> Err yes, but it is not that it has no effect but that if you are using 1000s there should not be the 'i'
>
> The B is already a separate flag from the 1024. If the caller wants to
> append the unicode character for "clown nose" to the returned string,
> that's not really human_readable()'s business.

See above. You have to have the "i" if you want to append the "B". But 
you can't just append both if you want the "B" in the case of <1000, 
because then you'll have 1KiB = 1024BiB, or 1KB = 1000BB, and there's no 
such thing as a BiB.

>> For my two cents I would suggest we go for IEC prefixes by default, yes they are so-so
>> but there is a standard and it does make things noticeably clearer, might as do it right instead
>> of the usual customary ComSci notation where it is Notoriously ambiguous
>
> The function is called human_readable().
>
> You want to default to binary units.
>
> What exactly is our goal here again?

Using binary powers is quite important for some human-readable cases. 
Take, for example, SSDs. For performance and longevity, you have to 
align data access to flash erase block sizes, which get up to 128KiB or 
256KiB. It's important then to align partitions on MiB (not MB) 
boundaries. cfdisk and Debian's partitioner get this horribly wrong. 
(Especially because you specify MB when creating partitions that it will 
then show you in MiB sizes).

> (Keeping the thundering hordes of android users happy. Right. Trying not
> to get emotionally invested in an aesthetic decision which hasn't _got_
> a right answer and just needs to be consistent. That said, if I can help
> kill the term "mebibytes" it is worth MUCH EFFORT on my part...)
>
>>> in the entire tree, there's only one use of HN_GETSCALE
>>> (/usr/bin/procstat), and it doesn't look like that's actually
>>> necessary).
>>>
>>> HN_DECIMAL and HN_NOSPACE are used a lot: ls, df, du, and so on. HN_B
>>
>> I did not have a HN_DECIMAL since I expect 0-9 to have a decimal point for a second
>> digit of precision, the range is to 999 anyway so it will not use more characters.
>>
>>> is used less, but in df, du, and vmstat. HN_DIVISOR_1000 is only
>>> really used in df (it's also used once each in "edquota" and
>>> "camcontrol").
>>
>> I would have no problem with df using units 1024 instead and displaying IEC Units
>
> Disks are sold in decimal measurements. People are going to ask why your
> horribly inefficient file format is eating so much of their disk space.

Even Windows shows disk free space in binary units.

> (What, did they stop doing that with flash? I'd be surprised if they did...)

No, SSDs are still sold in decimal sizes. But you have to _use_ them in 
binary sizes.

>>> HN_IEC_PREFIXES isn't used at all. not even a test.
>>
>> Yeah, I have noticed for myself, following the standard and even making it the default
>> so that you know what everything is in would be good, alas somewhat incompatable
>> with custom, but are scripts using -h and then parsing it... something is likely that dumb.
>> But it would be nice to actually do the right thing.
>
> Nothing extending the usage of the word "gibibytes" is the right thing.

Then just do like util-linux and use "G" instead of "GiB"

>>> so until we find a place where we want to turn off HN_DECIMAL, we're
>>> good. (that's a harder thing to grep for, but i couldn't find an
>>> instance in FreeBSD.)
>>
>> I would hope not, I would regard it as a useless loss of presision.
>> 9.9 will fit in the same space as 999 just fine.
>
> human_readable() _IS_ a useless loss of precision. That's what it's _for_.
>
> And the units advance by kilobytes so 9.9 and 999 are not rephrasings of
> each other. 999k and 1.0M can be from a rounding  perspective, but "loss
> of precision" is the reason rounding _exists_...
>

>>>>> You can also set a flags to drop the space between number and prefix or use the ubuntu 0..1023 style
>>>>> also you can request the limited range 0..999, 1.0 k-999 k style in either SI or IEC
>>>>
>>>> Yes, but why would we want to?
>>
>> Strict conformance to the standard? avoiding the 9999->9.8Ki transition.
>
> The first I heard of this standard was when you mentioned it. Ubuntu
> clearly wasn't doing it.
>

>>>> (If you git add a file, git diff shows no differences, mercurial diff
>>>> shows it diffed against /dev/null. I'm STILL getting used to the weird
>>>> little behavioral divergences.)

git diff --cached

That will show your staged changes (including added/removed file diffs).

>>>>> I hope this is interesting.
>>>>
>>>> It's very interesting and I'm keeping it around in case it's needed. I'm
>>>> just trying to figure out if the extra flags are something any command
>>>> is actually going to use. (And that's an Elliott question more than a me
>>>> question, I never use -h and it's not in posix or LSB.)
>>
>> Odd, it has been in common useage for years, but I guess it was just whatever
>> people felt a human would like to see rather than one of the standards.
>
> It's got a dozen flags because everybody who implemented this did it
> differently because the machine readable scriptable version is just to
> print out the actual NUMBER, thus the aesthetic cleanup is (or at least
> should be) just that.

And because different quantities are measured with different units. 
Network speeds use decimal; memory sizes use binary; and disk sizes use 
both.

> Bringing an international standards body into a purely aesthetic
> decision is weird. ANSI vs ISO tea was a _joke_.
>
> (Ok, maybe the aesthetic output has mutated into functional due to
> screen scrapers, which is what Elliott was implying by scripts depending
> on -h output. In which case either rigorously copying the historical
> mistakes or breaking them really loudly is called for. Adding a
> standards body to that sort of mess gives me a headache long before we
> get into any sort of details.)
>
> Rob
--
Regards,
Samuel Holland <samuel at sholland.net>

 1441433066.0


More information about the Toybox mailing list