[Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.

Fri Sep 4 08:43:43 PDT 2015

> From: enh at google.com
> Date: Thu, 3 Sep 2015 20:57:20 -0700
> To: rob at landley.net
> CC: james_mcmechan at hotmail.com; toybox at lists.landley.net
> Subject: Re: [Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.
>
> On Thu, Sep 3, 2015 at 6:52 PM, Rob Landley <rob at landley.net> wrote:
>> On 08/28/2015 09:47 PM, James McMechan wrote:
>>>> Date: Mon, 24 Aug 2015 20:47:03 -0500
>>>> From: rob at landley.net
>>>> To: enh at google.com
>>>> CC: toybox at lists.landley.net
>>>> Subject: Re: [Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.
>>>>
>>>> On 08/24/2015 03:10 PM, enh wrote:
>>>>> On Sun, Aug 23, 2015 at 6:20 PM, Rob Landley <rob at landley.net> wrote:
>>>>>> /me wists for a specification. Oh well. I hate when I have to guess at
>>>>>> what the right behavior _is_...
>>>
>>> Well checking back with my copy of "Engineering Fundamentals and Problem Solving" A. Eide et al 1979 Ch 5
>>> Engineering units are 0.1 to 999 followed by a space, prefix and SI unit.
>>>
>>> I am of the opinion that gratious loss of precision should be avoided.
>>> Since a one chararacter prefix and decimal point take two character spaces the natural
>>> breakpoint would be 10000 e.g. 9998,9999,10 k for SI decimal notation.
>>> Using the IEC two character binary prefix Ki/Mi/Gi uses three spaces with the '.'
>>> This would however yeild a breakpoint at 100 000 or 10 000 if we use a thousands seperator.
>>> Which seems to me a bit large.
>>
>> I already fixed it a different way (just took me a while to debug and
>> check it in), but I see you added a couple more options.
>>
>> Are these options we actually need? (I.E. expand 1023 and the force use
>> of units immediately?) They probably wouldn't be hard to add, but do we
>> have anything that actually needs them yet? (Is this compatible with the
>> bsd version and thus something we could push the posix guys to
>> standardize circa 2030 or so? Ok, more like sometime in the late 2040's.
>> Ok, let's face it: I don't engage with the Posix committe much because
>> interacting with Jorg Schilling is not something I'm willing to do in a
>> hobbyist capacity.)

Apparently the answer is yes, or at least BSD did, I have not run a BSD system in years.
The 1023 was because Rob had mentioned that that is what Ubuntu did :)
It is however a consistant choice, so I included it incase we needed Ubuntu's way for some reason.
Not a care in the world about dropping it.

> BSD has (https://www.freebsd.org/cgi/man.cgi?query=humanize_number&sektion=3):
>
> The following flags may be passed in scale:
>
> HN_AUTOSCALE Format the buffer using the lowest multiplier pos-
> sible.

If this does what I think it says you can end up with 1000000 KiB if the buffer is big enough?
also the scale factor can be a number to force a particular prefix.
int humanize_number(char *buf, size_t len, int64_t number, const char *suffix, int scale,	int flags);

Interesting, they use a len to prevent buffer overflow, and it looks like they may display a signed number?
Also they pass in the suffix I had a comment about that but had guessed we could keep 'B'

> HN_GETSCALE Return the prefix index number (the number of
> times number must be divided to fit) instead of
> formatting it to the buffer.

This is where you get the number to pass in as scale. Not hard, can anyone see a use for it though?

> The following flags may be passed in flags:
>
> HN_DECIMAL If the final result is less than 10, display it
> using one decimal place.

I would expect that this is only for prefixed scales, but who knows certain groups might have done it to small integers.

> HN_NOSPACE Do not put a space between number and the prefix.

Yes this one I can see using e.g. for ls which is also a place where the 'B' might not be present
It would however be consistant to include the space and the B

> HN_B Use `B' (bytes) as prefix if the original result
> does not have a prefix.

Is it just me or do you find this weird also, if you have an explicit prefix setting why not use it...
If you don't want to use it why is it there in the first place?

> HN_DIVISOR_1000 Divide number with 1000 instead of 1024.

Yep, I think network speeds are measured in SI units for example
I could live with 1024 units everywhere esp. if we also used the IEC prefixes

> HN_IEC_PREFIXES Use the IEE/IEC notion of prefixes (Ki, Mi,
> Gi...). This flag has no effect when
> HN_DIVISOR_1000 is also specified.

Err yes, but it is not that it has no effect but that if you are using 1000s there should not be the 'i'
For my two cents I would suggest we go for IEC prefixes by default, yes they are so-so
but there is a standard and it does make things noticeably clearer, might as do it right instead
of the usual customary ComSci notation where it is Notoriously ambiguous

> in the entire tree, there's only one use of HN_GETSCALE
> (/usr/bin/procstat), and it doesn't look like that's actually
> necessary).
>
> HN_DECIMAL and HN_NOSPACE are used a lot: ls, df, du, and so on. HN_B

I did not have a HN_DECIMAL since I expect 0-9 to have a decimal point for a second
digit of precision, the range is to 999 anyway so it will not use more characters.

> is used less, but in df, du, and vmstat. HN_DIVISOR_1000 is only
> really used in df (it's also used once each in "edquota" and
> "camcontrol").

I would have no problem with df using units 1024 instead and displaying IEC Units

> HN_IEC_PREFIXES isn't used at all. not even a test.

Yeah, I have noticed for myself, following the standard and even making it the default
so that you know what everything is in would be good, alas somewhat incompatable
with custom, but are scripts using -h and then parsing it... something is likely that dumb.
But it would be nice to actually do the right thing.

> so until we find a place where we want to turn off HN_DECIMAL, we're
> good. (that's a harder thing to grep for, but i couldn't find an
> instance in FreeBSD.)

I would hope not, I would regard it as a useless loss of presision.
9.9 will fit in the same space as 999 just fine.

>>>>> yeah, i was actually trying to avoid ending up with all the heuristics
>>>>> the BSD implementation has.
>>>>>
>>>>> the BSD man page says:
>>>>>
>>>>> If the formatted number (including suffix) would be too long to fit into
>>>>> buf, then divide number by 1024 until it will.
>>>>
>>>> That's just "test against 999, divide by 1024". Easy enough.
>>>>
>>>>> The len argument must be at least 4 plus the length of suffix, in order
>>>>> to ensure a useful result is generated into buf.
>>>>
>>>> That constraint's already implicit. I should make sure it's explicit.
>>>>
>>>>> so it certainly seems they follow the "no more than three digits/two
>>>>> digits plus '.'" rule.

That is what I was going for also

>>>> I can work with this.
>>>>
>>>> Thanks,
>>>>
>>>> Rob
>>>
>>> Attached is a patch that should allow for 0..9999, 10 k..999 k, 1.0 M..999 M SI units
>>> 0..9999, 9.8 Ki..999 Ki, 1.0 Mi..999 Mi... IEC binary units, note the 9999 -> 9.8 Ki transition
>>> I have tested this with LE32 BE32 LE64 while I have BE64 sparc I do not have a BE64 userspace
>>> and my other BE64 system is still on order.
>>
>> If this behaves differently on big or little endian, your compiler is at
>> fault. And long long should be 64 bit on 32 bit or 64 bit systems, due
>> to LP64. (There's no spec requiring long long _not_ be 128 bit, which is
>> a bit creepy, but nobody's actually done that yet that I'm aware of. I
>> should probably use uint64_t but the name is horrid and PRI_U64 stuff in
>> printf is just awkward, and it's a typedef not a real type the way
>> "int", "long", and "long long" are...)

I have developed paranoia over BE/LE & 32/64 over the years, subtle assumptions about
size or byte ordering can creep in and break things. One I can remember was in the ext2 code
they had a bit map in LE order but accessed it using longs rather than bytes so it had to have
the byteswap even though the code using bytes was just as simple and completely agnostic
about wordsize and BE/LE.

I could argue that long should be 128 bit on 64 bit computers but LP64 was a hack to work
around poorly written software, long long /should/ be 256 bits :) not mearly 128 bit.

Yes, uint64_t is a bit of a mess, but if the compiler puts some other size in there I would
feel fully justified in bitching about it. int, long and long long are compiler dependent and can
be whatever they desire and are per-arch, so I try to use it where I want  a particular size.

For example int was the size to store pointers in, as it was the machine word per K & R explicited stated store pointer in int.
now it is long, or better yet void *.
I did find a couple of uint128_t references on my system.

>>> You can also set a flags to drop the space between number and prefix or use the ubuntu 0..1023 style
>>> also you can request the limited range 0..999, 1.0 k-999 k style in either SI or IEC
>>
>> Yes, but why would we want to?

Strict conformance to the standard? avoiding the 9999->9.8Ki transition.

>>> This is pure integer, I could open code the printf also as it can only have 4 digits maximum at the moment.
>>> If you want I could make it autosizing rather than just one decimal between 0.1..9.9
>>> Also if any of the symbols are defined to 0 the capability will drop out.
>>> Perhaps I should make it default to IEC "Ki" style? getting it right vs bug compatibility.
>>>
>>> I made a testing command e.g. toybox_human_readable_test to allow me to test it.
>>
>> I had toys/examples/test_human_readable.c which I thought I'd checked in
>> a couple weeks ago but apparently forgot to "git add".

I was thinking maybe it needs a better name, outputting info for humans would be nice
to be able to do from the shell, so it could be actually used in production.

>> (If you git add a file, git diff shows no differences, mercurial diff
>> shows it diffed against /dev/null. I'm STILL getting used to the weird
>> little behavioral divergences.)
>>
>>> I hope this is interesting.
>>
>> It's very interesting and I'm keeping it around in case it's needed. I'm
>> just trying to figure out if the extra flags are something any command
>> is actually going to use. (And that's an Elliott question more than a me
>> question, I never use -h and it's not in posix or LSB.)

Odd, it has been in common useage for years, but I guess it was just whatever
people felt a human would like to see rather than one of the standards.

>> Rob
> --
> Elliott Hughes - http://who/enh - http://jessies.org/~enh/
> Android native code/tools questions? Mail me/drop by/add me as a reviewer.

 1441381423.0