[Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.

James McMechan james_mcmechan at hotmail.com
Fri Aug 28 19:47:52 PDT 2015


> Date: Mon, 24 Aug 2015 20:47:03 -0500
> From: rob at landley.net
> To: enh at google.com
> CC: toybox at lists.landley.net
> Subject: Re: [Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.
>
> On 08/24/2015 03:10 PM, enh wrote:
>> On Sun, Aug 23, 2015 at 6:20 PM, Rob Landley <rob at landley.net> wrote:
>>> /me wists for a specification. Oh well. I hate when I have to guess at
>>> what the right behavior _is_...

Well checking back with my copy of "Engineering Fundamentals and Problem Solving" A. Eide et al 1979 Ch 5
Engineering units are 0.1 to 999 followed by a space, prefix and SI unit.

I am of the opinion that gratious loss of precision should be avoided.
Since a one chararacter prefix and decimal point take two character spaces the natural
breakpoint would be 10000 e.g. 9998,9999,10 k for SI decimal notation.
Using the IEC two character binary prefix Ki/Mi/Gi uses three spaces with the '.'
This would however yeild a breakpoint at 100 000 or 10 000 if we use a thousands seperator.
Which seems to me a bit large.

>> yeah, i was actually trying to avoid ending up with all the heuristics
>> the BSD implementation has.
>>
>> the BSD man page says:
>>
>> If the formatted number (including suffix) would be too long to fit into
>> buf, then divide number by 1024 until it will.
>
> That's just "test against 999, divide by 1024". Easy enough.
>
>> The len argument must be at least 4 plus the length of suffix, in order
>> to ensure a useful result is generated into buf.
>
> That constraint's already implicit. I should make sure it's explicit.
>
>> so it certainly seems they follow the "no more than three digits/two
>> digits plus '.'" rule.
>
> I can work with this.
>
> Thanks,
>
> Rob

Attached is a patch that should allow for 0..9999, 10 k..999 k, 1.0 M..999 M SI units
0..9999, 9.8 Ki..999 Ki, 1.0 Mi..999 Mi... IEC binary units, note the 9999 -> 9.8 Ki transition
I have tested this with LE32 BE32 LE64 while I have BE64 sparc I do not have a BE64 userspace
and my other BE64 system is still on order.

You can also set a flags to drop the space between number and prefix or use the ubuntu 0..1023 style
also you can request the limited range 0..999, 1.0 k-999 k style in either SI or IEC

This is  pure integer, I could open code the printf also as it can only have 4 digits maximum at the moment.
If you want I could make it autosizing rather than just one decimal between 0.1..9.9
Also if any of the symbols are defined to 0 the capability will drop out.
Perhaps I should make it default to IEC "Ki" style? getting it right vs bug compatibility.

I made a testing command e.g. toybox_human_readable_test to allow me to test it.

I hope this is interesting.

Jim McMechan
 		 	   		  
-------------- next part --------------
diff -Nuwdr toybox.old/lib/lib.c toybox/lib/lib.c
--- toybox.old/lib/lib.c	2015-02-25 18:42:24.000000000 -0800
+++ toybox/lib/lib.c	2015-08-28 18:31:30.612911576 -0700
@@ -862,25 +862,72 @@
   closedir(dp);
 }
 
-// display first few digits of number with power of two units, except we're
-// actually just counting decimal digits and showing mil/bil/trillions.
-int human_readable(char *buf, unsigned long long num)
+// display numbers like 972 KiB or 1.2 MiB for humans to read.
+int human_readable(char *buf, uint64_t num, int flags)
 {
-  int end, len;
+  // either 1000 for SI units or 1024 for 2^10 ComSci units
+  int divisor = (flags & HR_SI) ? 1000 : 1024;
 
-  len = sprintf(buf, "%lld", num);
-  end = ((len-1)%3)+1;
-  len /= 3;
+  // if we care about k vs K and yes it would require uint128 before Z,Y or overflow #
+  const char *units = (flags & HR_SI) ? "\0kMGTPEZY#" : "\0KMGTPEZY#";
 
-  if (len && end == 1) {
-    buf[2] = buf[1];
-    buf[1] = '.';
-    end = 3;
+  // 0-9999 as themselves otherwise lose 2 digits of presision for 1000-9999
+  int final_scale = 999;
+  int scale = (flags & HR_SHRINK_RANGE) ? final_scale : 9999;
+
+  // if we want the range 0..1023[KMGTPEZY] instead. note requires more digits
+  // and acts as HR_SHRINK_RANGE if not using ComSci units
+  if (flags & HR_EXPAND_1023) final_scale = scale = divisor - 1;
+
+  // fixed is fixed point @ 100X will have after scaling and rounding a range of 50-999950
+  // overflowed values will be discarded because it will require scaling
+  int fixed = (100 * num) + 50;
+  
+  // if we require scaling for true number or becuase of rounding on fixed point number
+  while ((num > scale) || (fixed > ((100 * scale)+99))) {
+    // now since we are scaled we can only use the limited digits + suffix
+    scale = final_scale;
+
+    // now compute new fixed value before changing num
+    fixed = ((100 * num) / divisor);
+
+    // scale the true number for testing 
+    num /= divisor;
+
+    // rounding if > 9.99 add 0.5 ( * 100 since this is fixed point ) e.g. XX.5 values
+    // otherwisel add 0.05 ( * 100 since this is fixed point ) e.g. X.X5 values
+    if (fixed > 999 ) fixed += 50;
+    else fixed += 5;
+
+    if (units[1]) units++; // stop before we run out of string
   }
-  buf[end++] = ' ';
-  if (len) buf[end++] = " KMGTPE"[len];
-  buf[end++] = 'B';
+
+  int end;
+  end = sprintf(buf, "%d", fixed/100);
+
+  // if we have scaled then check for decimal point and add suffix
+  if (*units) {
+    // if the fixed value is less than 10 we can have a decimal point
+    // otherwise just stick on the space and units after 10-999
+    if (fixed < 1000) {
+      buf[end++] = '.';
+      buf[end++] = '0'+ ((fixed / 10) % 10);
+    }
+
+    // engineering practice is to always have a space before units
+    if (!(flags & HR_NO_SPACE)) buf[end++] = ' ' ;
+
+    // now add the unit prefixs
+    buf[end++] = *units;
+
+    // SI units wants to use KiB MiB etc. often we don't
+    if ((flags & (HR_FULL_SI | HR_SI)) == HR_FULL_SI ) buf[end++] = 'i';
+  } else if (!(flags & HR_NO_SPACE)) buf[end++] = ' '; // no units add space before "B"
+  
+  buf[end++] = 'B'; // is it allways going to be bytes?
   buf[end++] = 0;
 
   return end;
 }
+
+
diff -Nuwdr toybox.old/lib/lib.h toybox/lib/lib.h
--- toybox.old/lib/lib.h	2015-02-25 18:42:24.000000000 -0800
+++ toybox/lib/lib.h	2015-08-28 17:18:44.608481699 -0700
@@ -171,7 +171,22 @@
 void base64_init(char *p);
 int terminal_size(unsigned *x, unsigned *y);
 int yesno(char *prompt, int def);
-int human_readable(char *buf, unsigned long long num);
+int human_readable(char *buf, uint64_t num, int flags);
+
+// use units of 1000 instead of 1024
+#define HR_SI 1
+
+// stick the i in KiB
+#define HR_FULL_SI 2
+
+// use prefixes as soon as possbile even losing presision
+#define HR_SHRINK_RANGE 4
+
+// use 1023M instead of 1.0G let it use extra digits for 1024 based numbers
+#define HR_EXPAND_1023 8
+
+// don't put the space before the units e.g. 10kB not 10 kB
+#define HR_NO_SPACE 16
 
 // net.c
 int xsocket(int domain, int type, int protocol);
diff -Nuwdr toybox.old/toys/pending/dd.c toybox/toys/pending/dd.c
--- toybox.old/toys/pending/dd.c	2015-02-25 18:42:24.000000000 -0800
+++ toybox/toys/pending/dd.c	2015-08-28 14:02:15.837959177 -0700
@@ -133,9 +133,9 @@
   //out to STDERR
   fprintf(stderr,"%llu+%llu records in\n%llu+%llu records out\n", st.in_full, st.in_part,
       st.out_full, st.out_part);
-  human_readable(toybuf, st.bytes);
+  human_readable(toybuf, st.bytes, 0);
   fprintf(stderr, "%llu bytes (%s) copied,",st.bytes, toybuf);
-  human_readable(toybuf, st.bytes/seconds);
+  human_readable(toybuf, st.bytes/seconds, 0);
   fprintf(stderr, "%f seconds, %s/s\n", seconds, toybuf);
 }
 
diff -Nuwdr toybox.old/toys/pending/human_readable_test.c toybox/toys/pending/human_readable_test.c
--- toybox.old/toys/pending/human_readable_test.c	1969-12-31 16:00:00.000000000 -0800
+++ toybox/toys/pending/human_readable_test.c	2015-08-28 17:46:28.778278657 -0700
@@ -0,0 +1,47 @@
+/* human_readable_test.c - test stub for human_readable
+ *
+ * Copyright 2015 James McMechan
+
+USE_HUMAN_READABLE_TEST(NEWTOY(human_readable_test, "SisEC", TOYFLAG_USR|TOYFLAG_SBIN))
+
+config HUMAN_READABLE_TEST
+  bool "human readable test"
+  default y
+  help
+    usage: human_readable_test [-SisEC] [VALUES]...
+
+    Shows values in human readable form for testing the human_readable function
+
+    -S use SI units 1 000 = k, 1 000 000 = M... instead of IEC units 1024 = K, 1024 * 1024 = M
+    -i display the IEC /i/ in prefix if used
+    -s remove space before prefix units
+    -E expand IEC range 0..1023
+    -C collaspe range to only 0..999
+*/
+
+#define FOR_human_readable_test
+#include "toys.h"
+
+void human_readable_test_main(void)
+{
+  int i;
+  int flags = 0;
+
+  if (toys.optflags & FLAG_S) flags |= HR_SI;
+
+  if (toys.optflags & FLAG_i) flags |= HR_FULL_SI;
+
+  if (toys.optflags & FLAG_s) flags |= HR_NO_SPACE;
+
+  if (toys.optflags & FLAG_E) flags |= HR_EXPAND_1023;
+
+  if (toys.optflags & FLAG_C) flags |= HR_SHRINK_RANGE;
+
+  for (i=0; i < toys.optc; i++) {
+    uint64_t num;
+    char buf[64];
+    num = strtoull(toys.optargs[i],0,0);
+    human_readable(buf,num,flags);
+    printf("%s\n",buf);
+  }
+}
diff -Nuwdr toybox.old/toys/posix/du.c toybox/toys/posix/du.c
--- toybox.old/toys/posix/du.c	2015-02-25 18:42:24.000000000 -0800
+++ toybox/toys/posix/du.c	2015-08-28 14:01:00.904083828 -0700
@@ -55,7 +55,7 @@
   if (TT.maxdepth && TT.depth > TT.maxdepth) return;
 
   if (toys.optflags & FLAG_h) {
-    human_readable(toybuf, size);
+    human_readable(toybuf, size, 0);
     printf("%s", toybuf);
   } else {
     int bits = 10;


More information about the Toybox mailing list