[Toybox] [PATCH] cpio: support reading concatenated cpio files.

Rob Landley rob at landley.net
Sat Apr 17 02:38:18 PDT 2021


On 4/16/21 3:48 PM, enh wrote:
> On Fri, Apr 16, 2021 at 11:44 AM Yi-yo Chiang <yochiang at google.com
> <mailto:yochiang at google.com>> wrote:
> 
>     I'm not sure what Elliot's goal is? I assume he's trying to extract a
>     concatenated ramdisk, and I still see a problem in the current solution. 
> 
> 
> oops, i was going to say "i added you to the bug", but it looks like i added you
> to the *other* cpio bug (which appears to be resolving itself as "user error,
> we'll fix our script"). i've added you
> to https://issuetracker.google.com/184732694 now, but non-googlers won't be able
> to follow that link.

Yup, it's a login screen.

> all i want to do is get my rug back^W^W^W^Wkeep someone's existing workflow working.
>  
>     The buffer-format
>     (https://www.kernel.org/doc/Documentation/early-userspace/buffer-format.txt)
>     says:
> 
>       initramfs  := ("\0" | cpio_archive | cpio_gzip_archive)*
> 
>     In other words, both `cat a.cpio b.cpio >merged.cpio` and `(cat a.cpio &&
>     echo -n -e '\0\0\0' && cat b.cpio) >merged.cpio` are valid initramfs.
> 
> weird. do we know _why_ that's supported?

Because gnu/cpio is crapping trailing nul bytes at the end of archives it creates:

$ echo toys.h | cpio -o | hd | tail -n 5
2 blocks
000002c0  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
000002d0  0b 00 00 00 00 00 54 52  41 49 4c 45 52 21 21 21  |......TRAILER!!!|
000002e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000400

So if you concatenate them, you get runs of nul bytes.

>     btw gen_init_cpio.c also pads initramfs to 512-byte boundary
>     (https://github.com/torvalds/linux/blob/6fbd6cf85a3be127454a1ad58525a3adcf8612ab/usr/gen_init_cpio.c#L97)
> 
>     If we're viewing buffer-format.txt as the "right" cpio spec, then I think we
>     should implement this too. We should skip arbitrary extra NUL-bytes padded
>     between cpio file frames
>  
> not that it really matters (because i think we're all agreed that "do what the
> kernel does" makes more sense than "do what GNU/BSD think") but does GNU cpio
> cope with that? (see "strace" below for why i wonder.)
>  
> 
>     On Fri, Apr 16, 2021 at 2:27 PM Rob Landley <rob at landley.net
>     <mailto:rob at landley.net>> wrote:
> 
>         On 4/16/21 12:18 AM, Rob Landley wrote:
>         > On 4/15/21 6:57 PM, Rob Landley wrote:
>         >> On 4/15/21 12:06 PM, enh wrote:
>         >>> do you already have the "do the right thing" patch ready, or should
>         i send that
>         >>> today?
>         >>
>         >> I'm good, I'll try to get it in tonight. The hard part is the test
>         suite, not
>         >> the implementation.
>         >
>         > Correction: the hard part is I don't seem to have actually implemented
>         hardlink
>         > support yet. It just records/extracts them as separate files.
>         >
>         > Right. I need to test this against the kernel initramfs plumbing, in both
>         > directions...
> 
>              if (!strcmp("TRAILER!!!", name)) {
>         +      readall(afd, toybuf, 348);
> 
>         Where did you get 348 from? (What is this change doing?)
> 
>         I did what seems like a minimal change to continue past TRAILER!!! but
>         error on
>         an empty archive, but I apparently don't understand what your change
>         did? (I'm
>         assuming it fixed the use case you were seeing...?)
> 
> yeah, from strace i think GNU cpio avoids this misalignment by always reading
> 512-byte blocks (rather than toybox's "let me just read the header and then
> worry about anything else later").

Because you can't unget into a nonseekable filehandle.

> all i was trying to do was "get back in sync after a TRAILER!!! entry", and 348
> (by measurement) got me back to 512 after what toybox cpio had read, for the
> examples i had. (i had expected "is this guaranteed to be right for all possible
> valid cpio files?" to be the discussion we'd be having rather than the
> "shouldn't we just do what the kernel does?" :-) )
> 
> (if i haven't said so already, assume i know nothing about cpio and have never
> knowingly used it myself.)
> 
> interestingly, the current patch seems to break the AOSP build:

For some reason the github build test failed, although it ran fine here? And I
couldn't view the actual logs on my phone because Microsoft Github gratuitously
wanted me to log in for no obvious reason. (I never give my phone unnecessary
credentials.)

> 2021-04-16 18:56:56 - common.py - WARNING : Unable to get boot image build
> props: Failed to run command '['toybox', 'cpio', '-F',
> '/mnt/disks/build-disk/src/googleplex-android/sc-dev/out/soong/.temp/boot_xwvxvq26.img/uncompressed_ramdisk',
> '-i']' (exit code 1):
> cpio: bad header

Hmmm... the 4 trailing NUL bytes on the "trailer" entry _should_ just put it
back into alignment (strings are 4 byte aligned)? The old 6-digit magic field
puts the header 2 bytes short, and then the 10 byte "TRAILER!?!" entry WOULD
line up on 4 bytes except it should have a NUL terminator which puts it one
over, thus needing 3 bytes of padding, so 4 trailing NUL bytes?

> i'll chase up the person using `toybox cpio` and move them over to the hermetic
> prebuilt build toybox later. for now i'll try to reproduce this locally and see
> what the trouble is...

Is this a _new_ failure, or is this a failure of the concatenated files it
didn't handle before? (I.E. did my patch just not fix your issue since I don't
have access to that test case to confirm I'd fixed it, or is it now more broken
than it was?)

Rob


More information about the Toybox mailing list