[Toybox] Working on unxz-some questions

Rob Landley rob at landley.net
Sun Mar 3 19:45:27 PST 2013


On 03/01/2013 01:10:36 AM, Isaac Dunham wrote:
> I'm looking into adding an unxz based on xz-embedded, which is public  
> domain.

Cool!

I noticed this recently (due to the busybox thread about it) and was  
pondering the same myself. I downloaded the git repo but am not going  
to have time to look at it any time soon, happy somebody else is taking  
a look at it. :)

> However, I'm wondering about some things.
> Basically, I get the impression that some (most? all?) of the  
> compile-time options
> may not be reasonable.

Toybox's primary design goal is simplicity. Complexity is a limited  
resource that we spend on implementing features, increasing speed, and  
reducing size, but everything we do has to be worth the complexity cost.

> 1) xz allows several filters to improve compression of executables  
> (BCJ filters).
> Should all of these be turned on unconditionally, or should it be  
> user-selectable?
> The native BCJ filter for each arch is probably necessary for  
> compatability reasons,
> but I'm wondering about alternative ones (eg, should we enable sparc  
> BCJ filters
> everywhere?)

On kernel.org there are tar.gz files, tar.bz2 files, and tar.xz files.  
Our decompressor has to handle all of that.

On the compression side, we've got a quick streaming compressor already  
(gzip) which gets the low hanging fruit of compression and is going to  
be faster than anything else (fits in L1 cache a lot of the time), so I  
believe the main advantage of xz is _better_ compression? (Correct me  
if I'm wrong here, I don't use it much...)

I agree that 8 gazillion knobs isn't really what toybox is good at.

> 2) I assume that CRC64 support should be unconditional. Upstream  
> recently added
> crc64, but it's optional there.

Compatability with existing and future data files is the important  
thing.

> 3) Should unsupported integrity checks be ignored, cause an error, or  
> should
> this be a compile-time option?

On the compressor side or on the decompressor side?

On the decompressor side I'd probably just ignore them. We're going to  
have at least crc32, right? And then tar will internally have some  
basic "this is not a valid tar file" check...

> I'm assuming that even if we can't check, we should still decompress.

Doing the best we can to work with the input we're given, yes.

> Also, (assuming that at least one of the above should be  
> configurable) should the
> xz library part be configurable separately from the unxz command?   
> This is mainly
> relevant for if you plan to use it to decompress for tar et al.

Hmmm... that's the kind of thing we can clean up later (don't have to  
decide right now). Just do the xz command(s) and I'll wire it up to tar  
when I get around to doing tar. :)

(It's quite possible the right thing for tar is to just shell out to xz  
from the $PATH and pipe stuff through an external command, and if that  
command is internal then fork() and xexec() will do the right thing  
anyway. The reason this is the right thing is both simplicity of  
implementation and because SMP is pretty ubiquitous these days and two  
processes are SMP-friendly. If somebody wants to wire this into an  
u-boot with no scheduler, they can do it themselves.)

> Is there a way to conditionally compile code in lib/?

Not yet. In theory the gc-sections stuff is dropping out unused code,  
so it gets built but not included into the final binary.

In practice, I probably need to redo the build system because the gcc  
guys decided that their compiler was just too horrible to make  
build-at-once mode actually work, so they save the intermediate parse  
results into special ELF sections and then unload the actual code  
generation onto the linker, which is called link time optimization and  
is a horrible solution. So the "cc *.c" approach I've been doing  
doesn't take advantage of SMP and won't because the gcc developers are  
incompetent, and I need to work around them (or see if llvm is better).

So for right now, don't worry about it. Just add the file to lib and if  
the build gets uncomfortably slow I'll improve it later.

Rob


More information about the Toybox mailing list