[Toybox] [PATCH] cpio: support reading concatenated cpio files.

Sat Apr 17 02:43:56 PDT 2021

On Sat, Apr 17, 2021 at 2:56 PM Rob Landley <rob at landley.net> wrote:

> On 4/16/21 1:44 PM, Yi-yo Chiang wrote:
> > I'm not sure what Elliot's goal is? I assume he's trying to extract a
> > concatenated ramdisk, and I still see a problem in the current solution.
> >
> > The buffer-format
> > (
> https://www.kernel.org/doc/Documentation/early-userspace/buffer-format.txt)
> says:
> >
> >   initramfs  := ("\0" | cpio_archive | cpio_gzip_archive)*
> >
> > In other words, both `cat a.cpio b.cpio >merged.cpio` and `(cat a.cpio
> && echo
> > -n -e '\0\0\0' && cat b.cpio) >merged.cpio` are valid initramfs.
>
> It also implies that two compressed files can be concatenated and
> separated by
> arbirary runs of nulls, or you can have a compressed file and a
> non-compressed
> file concatenated, or...
>

Correct. Upon further inspection, it's actually "arbitrary NULLs could
prepend a GZIP(cpio_archive)", "arbitrary 4-aligned NULLS prepend a
*uncompressed* cpio_archive" and "cpio_file/cpio_trailer within a
cpio_archive have to be 4-aligned with arbitrary NULLs". initramfs.c seems
to try very hard to respect the alignment requirement, but I guess we could
just skip *ANY* extra NULLs for simplicity?

>
> Grrr. I need to test this. And possibly genericize the tar.c code to detect
> compression type and run it through a decompressor so cpio can do it too...
>

Sounds like another can of worms... :/
The buffer-format.txt seems to be a bit outdated, as Linux now supports a
lot of compression types besides gzip, and all of which are configurable (
https://elixir.bootlin.com/linux/latest/source/lib/decompress.c#L52). So
the initramfs grammar implemented by initramfs.c is in reality:

  initramfs  := ("\0" | cpio_archive | compressed_cpio_archive)*
  compressed_cpio_archive := CONFIG_COMPRESSION_ALGORITHM(cpio_archive)
  CONFIG_COMPRESSION_ALGORITHM := GZIP | BZIP2 | LZMA | XZ | LZO | LZ4 |
ZSTD

where the exact set of compression algorithms are decided by the kernel
config.

>
> > btw gen_init_cpio.c also pads initramfs to 512-byte boundary
> > (
> https://github.com/torvalds/linux/blob/6fbd6cf85a3be127454a1ad58525a3adcf8612ab/usr/gen_init_cpio.c#L97
> )
>
> *blink* *blink* Why...? (cpio doesn't have a 512 stride in the file
> format? It
> has a 4-byte stride for padding strings with NUL bytes, but that's about
> it?)
>
> > If we're viewing buffer-format.txt as the "right" cpio spec, then I
> think we
> > should implement this too. We should skip arbitrary extra NUL-bytes
> padded
> > between cpio file frames
>
> Skipping arbitrary extra null bytes at the start is easy enough to do. I
> guess
> the hardwired trailing read was expecting the 512 padding...
>
> I'm gonna need add a _lot_ more test suite entries for this command.
>
> Ok, skip arbitrary leading NUL bytes after each entry, pad last record to
> 512
> byte alignment with NUL bytes, autodetect compression type at each record
> start,
> implement hardlinks and have TRAILER!!! flush hardlink context...
>
>
I'm not so sure about padding the last entry to 512-byte boundary. 512
looks like a random value to me? (Or an implementation detail of GNU cpio
and gen_init_cpio). Nonetheless I think we should pad the last record to
4-byte boundary, so that both

  cat a.cpio.gz b.cpio.gz >c.cpio.gz

and

  zcat a.cpio.gz b.cpio.gz >c.cpio

are valid initramfs/cpio?

Rob
>

-- 

Yi-yo Chiang
Software Engineer
yochiang at google.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20210417/5fa6cfda/attachment-0001.htm>