[Toybox] [PATCH] Clean up xz a good amount

Oliver Webb aquahobbyist at proton.me
Tue Feb 27 19:27:51 PST 2024


On Tuesday, February 27th, 2024 at 20:51, Oliver Webb via Toybox <toybox at lists.landley.net> wrote:
> On Tuesday, February 27th, 2024 at 20:01, Rob Landley rob at landley.net wrote:
> 
> > On 2/27/24 19:23, Oliver Webb wrote:
> > 
> > > The below patch does a bunch cleanup on xzcat.c.
> > > It resolves the ifdefs that were in the code (But everything they were checking was defined so they didn't do anything)
> > > Adds Tests for errors (And while we are changing the test suite, converts testing -> testcmd)
> > > Rearange/Rewrite large comments to be smaller (and sometimes C99!).
> > > It puts main at the bottom and do_xzcat above it, the way code is usually arranged..
> > > Removes some function prototypes and the inline keyword (useless in modern C since it doesn't force anything)
> > > Removes "!= 0" (And turns "x == 0" to "!x")
> > > Some types replaced with their LP64 counterparts (uint8_t to char, etc)
> > > 
> > > Yes, I know that this is large for a patch, but a lot of that is shuffling code/comments. None of the logic has been
> > > changed. When all is said and done, this patch shrinks xzcat.c by over 10%
> > 
> > Applied (out of the list's spam filter, "message body too big"), but you've
> > introduced a glitch (missing flush maybe). When I tested it against the xz files
> > I had lying around via "xzcat $FILE | tar tv", one file failed with your patch
> > and succeeded before it. I copied it to my web server so you can take a look:
> > 
> > wget https://landley.net/musl-cross-make.tar.xz
> > 
> > (Yeah, it's 200 megs. Sorry about that. That was the first one that went "boing".)
> 
> 
> $ ./xzcat musl-cross-make.tar.xz | ./count -l > /dev/null
> 
> 229834752 bytes, 219Mb, 153Mb/s, 0m01s
> $ xzcat musl-cross-make.tar.xz | ./count -l > /dev/null
> 
> 229836800 bytes, 219Mb, 509Mb/s, 0m00s
> 
> Hmmm, there seems to be missing bytes at the end
> 
> $ ./xzcat musl-cross-make.tar.xz | xxd > fi
> 
> 229834752 bytes, 219Mb, 153Mb/s, 0m01s
> $ xzcat musl-cross-make.tar.xz | xxd > fi2
> 
> 229836800 bytes, 219Mb, 509Mb/s, 0m00s14364673,14364800d14364672
> $ diff fi fi2
> < 0db30000: 2022 2469 220a 2020 646f 6e65 0a66 690a "$i". done.fi.
> < 0db30010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> [zeros until end of file]
> 
> putting a fflush at the end of reading a file doesn't fix anything, it's losing
> exactly half a page so not flushing would also be my guess
> 
> > Also, I note that there is an "upstream" for this code, which may have changed
> > since I merged this (many moons ago), and any upstream changes are probably
> > worth a look:
> > 
> > https://git.tukaani.org/xz-embedded.git
> > 
> > The one in pending is an old version of that glued together into one file with a
> > config header, NEWTOY() and command_main() wrapper. (Yes, the upstream is public
> > domain! Extractor only, not compressor. I haven't found a public domain xz
> > compressor yet.)
> > 
> > I'm also unclear on the difference between xz and 7zip, and the linux kernel's
> > "general setup menu" also offers lzma, lzo, lz4, and zstd as initramfs
> > compression options. I just know kernel.org has tar.xz files, and
> > https://linuxfromscratch.org/lfs/view/12.0/chapter03/packages.html has .gz,
> > .bz2, and .xz tarballs, hence those are the three formats toybox can decompress.
> > 
> > It should be able to create gz and zip, which is also what
> > https://github.com/landley/toybox/tags offers to create. (Both of those are
> > deflate, I know how and wrote one in java once long ago. I just got sidetracked
> > on when to do dictionary resets to match the hashes of existing tarballs. The
> > answer is probably "nuts to your white mice" and just do it every 250k or
> > something, and oh well the hashes don't match but you get the data back and
> > that's the important thing. But first I wanted to grab a bunch of existing .gz
> > tarballs and CHECK when they do dictionary resets, because maybe it IS already
> > something simple like that.)
> > 
> > Rob
> > 
> > P.S. Alas, I don't make a habit of extensively reading incompatibly licensed
> > code before writing a 0BSD version, so this kind of question can linger a while
> > on the todo list when I could theoretically just go track it down in the source...

The symbols that differ:
name                                           old       new      delta
-----------------------------------------------------------------------
xz_dec_run                                    1901      1907          6
zhelp_data                                     100        98         -2
help_data                                       94        90         -4
do_xzcat                                       491       467        -24
xz_dec_bcj_reset                                41         0        -41
-----------------------------------------------------------------------
                                                                    -65 total
2 are help text, do_xzcat was due to replacing the case statement for error handling with a array
If I had to guess, xz_dec_run or xz_dec_bcj_reset

It was the case statement I stripped out in xz_dec_bcj_reset, it works with the attached patch

- Oliver Webb aquahobbyist at proton.me
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-fix-bug-in-xzcat.c.patch
Type: text/x-patch
Size: 1897 bytes
Desc: not available
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20240228/a5c57925/attachment-0001.bin>


More information about the Toybox mailing list