<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Wed, Dec 11, 2024 at 6:27 AM Rob Landley <<a href="mailto:rob@landley.net">rob@landley.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 12/10/24 12:37, enh wrote:<br>

> On Sun, Dec 8, 2024 at 12:51 AM Rob Landley <<a href="mailto:rob@landley.net" target="_blank">rob@landley.net</a>> wrote:<br>

> <br>

>> On 12/7/24 18:39, enh wrote:<br>

>>> On Sat, Dec 7, 2024, 18:25 Rob Landley <<a href="mailto:rob@landley.net" target="_blank">rob@landley.net</a>> wrote:<br>

>>><br>

>>>> On 12/6/24 13:57, enh wrote:<br>

>>>>> We're seeing ever more zstd-compressed files in the wild, so even<br>

>> though<br>

>>>>> toybox can't compress/decompress zstd without an external helper, it<br>

>>>>> still seems useful to integrate with any that happens to be on the<br>

>>>>> system.<br>

>>>><br>

>>>> No short option for zstd, even though every other explicit archive<br>

>>>> format has one?<br>

>>>><br>

>>><br>

>>> technically there are a couple of other compression options that are<br>

>>> longopt only,<br>

>><br>

>> In gnu/gnu.<br>

>><br>

>>> such as --lzma (but i haven't added those here because i've<br>

>>> yet to see them used).<br>

>>><br>

>>> this probably made sense when it was added in 2019, and it wasn't clear<br>

>> how<br>

>>> popular, zstd was going to become. (especially in comparison to the other<br>

>>> options we don't have.)<br>

>>><br>

>>> though tbh, zstd seems more popular in non-tar contexts ... i had to ask<br>

>>> the internet what the long and short extensions were!<br>

>><br>

>> Imma hijack -Z. I'm aware in debian that's "compress" but we've never<br>

>> supported that format, which was patented in the 1980s causing it to be<br>

>> completely replaced by gzip except for some old legacy archives you can<br>

>> "compress -d file.Z | tar x" if you like.<br>

>><br>

> <br>

> yeah, sounds reasonable.<br>

> <br>

> coincidentally i saw <a href="https://www.phoronix.com/news/Linux-EFI-Zboot-Gzip-Zstd" rel="noreferrer" target="_blank">https://www.phoronix.com/news/Linux-EFI-Zboot-Gzip-Zstd</a><br>

> "Linux EFI Zboot Abandoning "Compression Library Museum", Focusing On Gzip<br>

> & Zstd" which made me laugh, given that that had been my reaction to the<br>

> other formats that gnu tar supports (and has single-letter options for!)<br>

> that toybox tar doesn't (and almost certainly shoudn't) like lzip and lzop.<br>

> presumably characters from a children's show in a language i don't speak?<br>

<br>

Way back when then pkzip 2.0 came out there was arj and pak and zoo and <br>

several others, I was never entirely sure what the under the cover <br>

differences were (especially since the archive and the compression are <br>

two different formats). I also remember that zip itself supported a <br>

bunch of legacy formats (hence the Nancy Button: "Unzip, expand, <br>

explode, what pervert came up with this in "the little caligraphic <br>

button catalogue on the prairie" circa 1984. I think that was the first <br>

one I got at that Dr. Who convention, "Don't crush that dwarf, hand me <br>

the caligraphic button catalogue" was later...)<br>

<br>

I blogged about there being a similar group of compression formats <br>

(supported in the linux kernel's zimage and initramfs expanders) and <br>

having no idea which would "win", and winding up with xz because txz was <br>

the format kernel tarballs were available in and I found a public domain <br>

expander program.<br>

<br>

I don't know what the difference between xz and zstd is, I've mostly <br>

avoided technology that comes from faceboot because zuckerberg and thiel <br>

somehow manage to be worse than gates and ballmer.<br>

<br>

>> (I just like there to BE a short option, and another obvious contender<br>

>> isn't presenting itself. Plus I haven't got an obvious way to test this<br>

>> anyway.)<br>

> <br>

> yeah, i just tested manually. it did occur to me that the test shell script<br>

> could check to see whether there's a zstd(1) binary on the path, and skip<br>

> any zstd tests if not?<br>

<br>

At some point I need to categorize the skips. Not sure how yet, there's <br>

a missing design idea.<br>

<br>

But "gnu/command never passed this", "musl never passed this", "busybox <br>

doesn't pass this", "bionic never passed this", "old glibc passes this <br>

but new one has version skew"...<br>

<br>

I want more granularity out of skipped but dunno what the annotation(s) <br>

should be. Maybe "skip strings" added to the end of the line as a <br>

parenthetical? With a VERBOSE=why added to VERBOSE=allfailnopassquietspam<br></blockquote><div><br></div><div>yeah, that's what the unit testing framework i use does ... it's surprisingly useful for the person _running_ the tests too, because they can decide whether or not they expect to see the skips. "i thought i did have an encrypted fs?" "i thought my kernel was new enough for that?" etc.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

As I said, missing design work...<br>

<br>

> (and there's really no excuse for me not adding a file(1) test beyond "we<br>

> don't have tests for _most_ of the recognized formats", though "this is<br>

> just a constant prefix match" is a slightly better excuse.)<br>

<br>

I'm always up for adding more tests, but I haven't been trying to do so <br>

piecemeal because it doesn't save work for an eventual "trying to be <br>

systemic" pass where you go line by line through the source and relevant <br>

standards and write a test for every decision..<br>

<br>

<br>

>> <a href="https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md" rel="noreferrer" target="_blank">https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md</a><br>

<br>

I note that I have yet to see zstd tarballs in the wild. Not one of the <br>

kernel formats, not one of the linux from scratch formats...</blockquote><div><br></div><div>this was the first time i'd seen them too. chromeos is looking at going from bz2 to zstd for their firmware update tarballs. one of the nice things about zstd is that it gives you even more control than gzip about where you want to be on the "time vs space" spectrum, so for folks who're compressing once on a build server and shipping to lots of people, you can turn it up to 11 (aka "ultra 22") to get great compression ratios _and_ still be very fast to decompress. (interestingly, for android otas zstd didn't seem to be very useful, but that's perhaps because they already try all of their menagerie of compression algorithms on every input and take whichever's best, leaving little room for zstd because for any given input there's already an algorithm that's near optimal?)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Implementing "zip" is higher on my priority list, which means finishing <br>

deflate compression side, which means answering the dictionary reset <br>

question. (Although if I don't care about producing binary equivalent <br>

tarballs, "every X bytes" is fine. Maybe  every 250k? The problem with <br>

calculating a non-default huffman tree is you need to read the data <br>

before compressing it to count the symbol frequency, so what's the input <br>

buffer size...)<br>

<br>

Rob<br>

</blockquote></div></div>