[Toybox] ftell/fseek
Rob Landley
rob at landley.net
Sun Feb 19 07:42:18 PST 2023
On 2/18/23 22:51, enh via Toybox wrote:
> on your blog you said:
> """
> Oh goddess fsetpos() is a stupid API, isn't it? The classic ftell()
> returns long which is signed 32 bits on 32 bit systems, and files are
> bigger than that these days, but instead of doing some sort of
> lftell() which returns long long (and an lfseek that accepts it) they
> invented a new gratuitous fpos_t type which they pretend isn't just a
> typedef for "long long", and then created two new libc functions with
> completely unrelated names: int fgetpos(FILE *fp, fpos_t *pos) and int
> fsetpos(FILE *fp, const fpos_t *pos), both of which are FUCKING
> STUPID.
> """
>
> check out fseeko()/ftello() ... they're basically the functions you
> were asking for,
Yes they are. Thanks.
Can somebody find Michael Kerrisk and ask him why "man 3 ftell" has a) that
weird collection of functions collated into one man page, B) a "see also" for
fseeko() but not ftello()?
> albeit using off_t rather than "raw" long long.
If we're not doing pointers that have to match types exactly then I don't have
to care about their weird typecast wrappers as long as the type I'm using is
guaranteed big enough and the compiler doesn't gratuitously warn.
Yes I'm aware of the drive for 128 and 256 bit integers ala
https://thephd.dev/intmax_t-hell-c++-c but back in 2006 I calculated that if the
Moore's Law S-curve DIDN'T bend down we'd need 128 bit registers to address
system RAM somewhere around 2053:
http://catb.org/~esr/writings/world-domination/world-domination-201.html#id286760
Although "I confirmed this was too far in the future to need to care about
almost 20 years ago" is not exactly a flex. Sigh, is that assumption still
valid? Quick dive...
On the storage front, flash manufacturing capacity growth was already taking
atomic limits in the teeth 5 years ago:
https://www.techtarget.com/searchstorage/opinion/The-end-of-Moores-law-for-SSD-performance
But estimated worldwide storage capacity is still expected to "almost double"
from 2020 to 2024:
https://horizontechnology.com/news/hdd-remains-dominant-storage-technology-1219/
Both of which mean the Moore's Law 18 month doubling time ain't exactly being
maintained there even with cloud farm dollars being thrown at more-than-RAID NAS
nonsense, but let's keep going...
Big Storage is still spinning rust... albeit the rust is mixed with platinum
these days and the substrate is glass:
https://www.anandtech.com/show/15484/the-road-to-80-tb-hdds-showa-denko-develops-hamr-platters-for-hard-drives
The bulk spinning sort-of-rust maker Showa Denko is aiming to have a 30 terabyte
drive by the end of this year, although the manufacturers are saying 2024 to get
it in the market:
https://horizontechnology.com/news/hard-drive-capacity-and-the-road-to-50tb/
If we take "30T in 2023" as the "now" capacity it means we're using about 45
bits of address space to byte address the whole disk, and using the old 18 month
doubling time to consume the remaining 19 bits would give us 28 years before the
_disk_ (not a file) was too big to index with long long.
That's still far enough in the future it's beyond my prediction horizon. Past
the end of oil, global warming, population shrinkage, most likely capitalism,
and MAYBE long enough for something to finally replace cobol and deliver optical
chips. About the only things I can confidently predict that far out is neither
non-scam blockchain nor quantum computing will be a thing yet.
Yeah, I'm still comfortable sticking with "long long" for file indexes. We can
always change it later. :)
> (funnily enough, although this ought to be irrelevant to most people
> in 2023 thanks to LP64 making fseek()/ftell() equivalent to
> fseeko()/ftello(),
Only on 64 bit systems. Embedded is still a thing.
> between the low end still being LP32 and the
> Windows host being LLP64, i've still had cause to move code over even
> in the last few months!)
The problem is any process that has a "sometimes runs in SRAM" mode (which is a
common embedded use case avoiding both the size AND power consumption of DRAM
refresh; you can comfortably embed 256k of SRAM on die in a single chip SOC:
DRAM not so much) then the extra stack consumption of pushing 64 bit register
contents gets VERY uncomfortable, and the _hardware_ can't know it doesn't need
to do that unless there's some kind of x32 mode.
Alas, while https://lkml.org/lkml/2018/5/16/207 was submitted, the kernel clique
circled their heads up their asses and "git log arch/arm | grep -i ILP32" still
returns zero hits today. (Meanwhile cortex-m is 32 bit only...)
> fwiw, fgetpos()/fsetpos() was the C standards committee's fault ---
> they wanted to be able to support systems where file offsets weren't
> just integers. presumably the same systems that didn't have 8-bit
> bytes and weren't using two's complement :-P
That's insane.
There were only EVER two alternatives to 8-bit bytes I'm aware of. The Soviet
ternary logic nonsense never made it out of Russia and was abandoned in 1965 (4
years before Unix):
https://en.wikipedia.org/wiki/Setun
And the 6-bit era ended when the Jupiter project was discontinued in 1983
(decision not to sell a PDP-10 successor because the ecosystem had collapsed), 6
years before ANSI C came out. (If you're wondering why Richard Stallman pivoted
from ITS to Gnu in 1983, that's why. His old platform collapsed under him so he
grabbed the biggest community he could find and declared himself its leader.)
https://en.wikipedia.org/wiki/Jupiter_project
The only 6-bit machine to sell in anything like volume was the 12-bit PDP-8 (50k
units total, mostly in the 1960s):
https://homepage.cs.uiowa.edu/~jones/pdp8/faqs/#PDP
DEC ended service and support for the installed base of 6-bit systems in 1990,
which covered both the PDP8 and PDP10.
The PDP-8 could address 6k of ram (4096 12-bit bytes) and its magnetic core
memory ran at 1/3 of a mhz. The original January 1984 macintosh had 128k of ram
and ran at ~6mhz. The Compaq Desqpro 386 (september 1986, 3 years before C89
came out) ran at 16mhz and came with 1 megabyte of ram (expandable to 8).
Good grief, the Bell Labs Unix v1 was the ONLY 6-bit version of Unix, V2 was the
port to PDP-11 (in 1971) and they NEVER LOOKED BACK. Their 1974 paper announcing
Unix to the world contains the string "8-bit" but not the sting "6-bit":
https://dsf.berkeley.edu/cs262/unix.pdf
This is one of those "a child of 5 could understand this, fetch me a child of 5"
things, isn't it?
Rob
P.S. The gas station near Fade's dorm still sells the "Khaos" tangerine flavored
monster energy drinks. I am officially too old to drink those, but they're
_really_ tasty.
P.P.S. MIT having a PDP-10 called "lsd.ai.mit.edu" does help to explain RMS.
More information about the Toybox
mailing list