<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Apr 17, 2021 at 7:32 PM Rob Landley <<a href="mailto:rob@landley.net">rob@landley.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 4/17/21 4:43 AM, Yi-yo Chiang wrote:<br>

> On Sat, Apr 17, 2021 at 2:56 PM Rob Landley <<a href="mailto:rob@landley.net" target="_blank">rob@landley.net</a><br>

> <mailto:<a href="mailto:rob@landley.net" target="_blank">rob@landley.net</a>>> wrote:<br>

> <br>

>     On 4/16/21 1:44 PM, Yi-yo Chiang wrote:<br>

>     > I'm not sure what Elliot's goal is? I assume he's trying to extract a<br>

>     > concatenated ramdisk, and I still see a problem in the current solution. <br>

>     ><br>

>     > The buffer-format<br>

>     ><br>

>     (<a href="https://www.kernel.org/doc/Documentation/early-userspace/buffer-format.txt" rel="noreferrer" target="_blank" class="cremed">https://www.kernel.org/doc/Documentation/early-userspace/buffer-format.txt</a>)<br>

>     says:<br>

>     ><br>

>     >   initramfs  := ("\0" | cpio_archive | cpio_gzip_archive)*<br>

>     ><br>

>     > In other words, both `cat a.cpio b.cpio >merged.cpio` and `(cat a.cpio && echo<br>

>     > -n -e '\0\0\0' && cat b.cpio) >merged.cpio` are valid initramfs.<br>

> <br>

>     It also implies that two compressed files can be concatenated and separated by<br>

>     arbirary runs of nulls, or you can have a compressed file and a non-compressed<br>

>     file concatenated, or...<br>

> <br>

> <br>

> Correct. Upon further inspection, it's actually "arbitrary NULLs could prepend a<br>

> GZIP(cpio_archive)",<br>

<br>

I'm not currently handling that case, and I'm not sure where is the right place<br>

to handle it? (Should gzip handle it, or should cpio call out to gzip?)<br>

<br>

And then you have to care that the _compressor_ stops gracefully at the end of<br>

its compressed data isn't reading/discarding extra from its input...<br>

<br></blockquote><div><br></div><div>I just read more into the kernel initramfs.c and decompressor_*.c, and seems like even the kernel doesn't handle this all that well.</div><div>For example, the gzip decompressor (inflate) stops gracefully at the end of compressed data, but lz4 decompressor doesn't and errors when there is data past the end of compressed data.</div><div>So even though "cat a.cpio.gz b.cpio.lz4 >c.ramdisk" and "cat a.cpio.lz4 b.cpio.gz >c.ramdisk" both follow the initramfs grammar, the kernel can only boot the former case. I even found a bug describing the same issue (<a href="https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840945">https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840945</a>) but it kind of went out of everyone's attention.</div><div><br></div><div>Back to the original question, I think handling concatenated uncompressed cpio is good enough. I can't comment too much on concatenated mixed compressed cpio as I'm unfamiliar with all those different compression algorithms, but since even the kernel doesn't fully support this configuration, I guess there isn't much use case out there.</div><div><br></div><div>I think it's safe to say that for the majority of use cases, it's sufficient to pipe the output of zcat or lz4cat into "cpio -i" to unpack the initramfs, that is, the ramdisk is usually formed by concatenating multiple compressed archives without padding ( COMPRESS((cpio + alignment) * N), but not (COMPRESS((cpio + alignment) * N) + alignment) * M ).</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> "arbitrary 4-aligned NULLS prepend a *uncompressed*<br>

> cpio_archive"<br>

<br>

This case it should be handling now.<br>

<br>

> and "cpio_file/cpio_trailer within a cpio_archive have to be<br>

> 4-aligned with arbitrary NULLs". initramfs.c seems to try very hard to respect<br>

> the alignment requirement, but I guess we could just skip *ANY* extra NULLs for<br>

> simplicity?<br>

<br>

It was already 4-aligned. That's part of the file specification. Padding with<br>

_more_ than that was throwing it off, though. Should handle it now?<br>

<br>

(Let me know what other tests I should add to tests/cpio.test.)<br>

<br>

>     Grrr. I need to test this. And possibly genericize the tar.c code to detect<br>

>     compression type and run it through a decompressor so cpio can do it too...<br>

> <br>

> <br>

> Sounds like another can of worms... :/<br>

<br>

Indeed. Haven't started that yet because tar.c is already doing it and I want to<br>

factor out common code from that, ala:<br>

<br>

        // detect gzip and bzip signatures<br>

        if (SWAP_BE16(*(short *)hdr)==0x1f8b) toys.optflags |= FLAG_z;<br>

        else if (!memcmp(hdr, "BZh", 3)) toys.optflags |= FLAG_j;<br>

        else if (peek_be(hdr, 7) == 0xfd377a585a0000UL) toys.optflags |= FLAG_J;<br>

        else error_exit("Not tar");<br>

<br>

> The buffer-format.txt seems to be a bit outdated, as Linux now supports a lot of<br>

> compression types besides gzip, and all of which are configurable<br>

> (<a href="https://elixir.bootlin.com/linux/latest/source/lib/decompress.c#L52" rel="noreferrer" target="_blank" class="cremed">https://elixir.bootlin.com/linux/latest/source/lib/decompress.c#L52</a>). So the<br>

> initramfs grammar implemented by initramfs.c is in reality:<br>

> <br>

>   initramfs  := ("\0" | cpio_archive | compressed_cpio_archive)*<br>

>   compressed_cpio_archive := CONFIG_COMPRESSION_ALGORITHM(cpio_archive)<br>

>   CONFIG_COMPRESSION_ALGORITHM := GZIP | BZIP2 | LZMA | XZ | LZO | LZ4 | ZSTD<br>

> <br>

> where the exact set of compression algorithms are decided by the kernel config. <br>

<br>

Exactly. Toybox knows about gzip, bzip, and xz. (The only compressor I currently<br>

plan to natively support is gzip, but it has decompressors for the other two.<br>

The xz one is a bit stale and still in pending and needs serious cleanup, but<br>

was sourced from public domain code.)<br>

<br>

I can add more, but it hadn't previously come up?<br>

<br>

Also, I'm really fuzzy on the difference between xz/lzma/lzo/lz4/zstd.<br>

<br>

>     > btw gen_init_cpio.c also pads initramfs to 512-byte boundary<br>

>     ><br>

>     (<a href="https://github.com/torvalds/linux/blob/6fbd6cf85a3be127454a1ad58525a3adcf8612ab/usr/gen_init_cpio.c#L97" rel="noreferrer" target="_blank" class="cremed">https://github.com/torvalds/linux/blob/6fbd6cf85a3be127454a1ad58525a3adcf8612ab/usr/gen_init_cpio.c#L97</a>)<br>

> <br>

>     *blink* *blink* Why...? (cpio doesn't have a 512 stride in the file format? It<br>

>     has a 4-byte stride for padding strings with NUL bytes, but that's about it?)<br>

> <br>

>     > If we're viewing buffer-format.txt as the "right" cpio spec, then I think we<br>

>     > should implement this too. We should skip arbitrary extra NUL-bytes padded<br>

>     > between cpio file frames<br>

> <br>

>     Skipping arbitrary extra null bytes at the start is easy enough to do. I guess<br>

>     the hardwired trailing read was expecting the 512 padding...<br>

> <br>

>     I'm gonna need add a _lot_ more test suite entries for this command.<br>

> <br>

>     Ok, skip arbitrary leading NUL bytes after each entry, pad last record to 512<br>

>     byte alignment with NUL bytes, autodetect compression type at each record start,<br>

>     implement hardlinks and have TRAILER!!! flush hardlink context...<br>

> <br>

> <br>

> I'm not so sure about padding the last entry to 512-byte boundary. 512 looks<br>

> like a random value to me? (Or an implementation detail of GNU cpio and<br>

> gen_init_cpio). Nonetheless I think we should pad the last record to 4-byte<br>

> boundary, so that both<br>

> <br>

>   cat a.cpio.gz b.cpio.gz >c.cpio.gz<br>

<br>

It's been padding it to a 4 byte boundary all along, that's what those trailing<br>

4 NULs on TRAILER?!? were for. (The first is the null terminator for the string,<br>

the other 3 are padding for alignment: 110+10+1+3=124 which is 31*4.)<br>

<br></blockquote><div><br></div><div>Ah you're right this was always the case, my mistake, I misread the code. So the only problematic case was extra padding between two cpio file frames, which was handled by the latest code.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> and<br>

>   <br>

>   zcat a.cpio.gz b.cpio.gz >c.cpio<br>

> <br>

> are valid initramfs/cpio? <br>

<br>

That's the headache part: should zcat understand that sort of concatenation? The<br>

gnu/dammit cpio implementation doesn't call out to compressors, you MUST do it<br>

in a pipeline. And even if zcat understood concatenation with NUL bytes, you can<br>

glue an .xz file to a .gz. Which tool hands off to which tool and when does<br>

control come BACK... (At least gz has little signatures at the start of blocks<br>

so runs of NUL bytes can be detected as invalid. Don't remember what bz2 does<br>

off the top of my head, and never learned what xz does...)</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

I can teach my cpio to call out to decompressors, but this is new design that<br>

needs to be thought through. Does it automatically do it, is there a new flag?<br>

Is this decompression side only and the compression side still needs its output<br>

piped?</blockquote><div><br></div><div><div>I highly doubt zcat support initramfs-style concatenated .gz. AFAICT, in order to deal with "(cat a.cpio.gz && echo -n -e '\0\0' && cat b.cpio.gz)>initramfs.img", right now we need to use tools such as binwalk && dd to slice the initramfs.img into its individual components, and then pipe the sliced chunks into zcat, lz4cat ... whatever-cat. It sure sounds useful for cpio to have an option or flag (like tar) to let it auto detect the compression method and call the compression library.</div></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Ok, I just checked bzcat.c and each compressed block starts with a 48 bit<br>

signature (with two valid, both nonzero values), so runs of zeroes can also be<br>

detected as "not valid block". Unfortunately that's reading in 4k blocks so<br>

you'd have to pad with a LOT of zeroes for it not to eat the start of the next<br>

chunk. (I can reduce the IOBUF_SIZE in my implementation but if it's calling an<br>

old version or some OTHER implementation when it runs "bzcat" out of the<br>

$PATH...) Concatenating uncompressed archives should be safe, and concatenating<br>

gzip chunks can presumably be _made_ safe, but with arbitrary archivers how much<br>

NULL padding you need is undefined, and so is what error states they'll exit with...<br>

<br>

Also, if we're diverging from the gnu/dammit version this far, I've had a todo<br>

item to teach my cpio to both understand and generate the kernel's<br>

gen_init_cpio.sh text file format for a while now. And ALSO it would be nice if<br>

there was a more conventional "recurse and make an archive from this list of<br>

files on the command line" the way tar and zip work; that's probably a new cpio<br>

-X option letter...<br>

<br>

These were all post-1.0 todo items until this can of worms got reopened. :)<br>

<br>

Rob<br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><table width="90%" border="0" cellspacing="0" cellpadding="0" style="margin:0px;padding:0px;font-family:"Times New Roman";max-width:348px"><tbody style="margin:0px;padding:0px"><tr style="margin:0px;padding:0px"><td style="padding:0px"><table border="0" cellspacing="0" cellpadding="0" style="margin:0px;padding:20px 0px 0px"><tbody style="margin:0px;padding:0px"><tr style="margin:0px;padding:0px"><td valign="top" style="padding:0px 20px 0px 0px;vertical-align:top;border-right:1px solid rgb(213,213,213)"><img src="https://i.imgur.com/eGpkLls.png" width="200" height="64"><br></td><td style="padding:0px 0px 0px 20px"><table border="0" cellspacing="0" cellpadding="0" style="margin:0px;padding:0px"><tbody style="margin:0px;padding:0px"><tr style="margin:0px;padding:0px"><td colspan="2" style="font-family:Arial,Helvetica,Verdana,sans-serif;padding:1px 0px 5px;font-size:13px;line-height:13px;color:rgb(56,58,53);font-weight:700">Yi-yo Chiang</td></tr><tr style="margin:0px;padding:0px"><td colspan="2" style="font-family:Arial,Helvetica,Verdana,sans-serif;padding:0px 0px 5px;font-size:11px;line-height:13px;color:rgb(56,58,53)">Software Engineer</td></tr><tr style="margin:0px;padding:0px"><td colspan="2" style="font-family:Arial,Helvetica,Verdana,sans-serif;padding:0px 0px 5px;font-size:11px;line-height:13px;color:rgb(56,58,53)"><a href="mailto:yochiang@google.com" target="_blank">yochiang@google.com</a></td></tr><tr style="margin:0px;padding:0px"><td colspan="2" style="font-family:Arial,Helvetica,Verdana,sans-serif;padding:0px 0px 3px;font-size:11px;line-height:13px;color:rgb(3,112,248)"></td></tr></tbody></table></td></tr></tbody></table></td></tr></tbody></table></div></div></div>