[Toybox] [PATCH] cpio: support reading concatenated cpio files.

enh enh at google.com
Wed Apr 14 11:26:59 PDT 2021


On Wed, Apr 14, 2021 at 12:11 AM Rob Landley <rob at landley.net> wrote:

> On 4/13/21 6:47 PM, enh via Toybox wrote:
> > Apparently this is a thing people like to do, and they use shell while
> > loops to do it (because cpio only reads one input file at a time from
> > stdin).
> >
> > To make this work, we need to (a) exit with a failure status rather than
> > success if we hit EOF (we should never normally hit EOF because we
> > should read a TRAILER!!! first), and (b) skip to the end of the
> > TRAILER!!! record rather than just exiting immediately (so that the
> > _next_ cpio to run can start reading stdin at the start of a record,
> > rather than partway through the TRAILER!!! record that caused this cpio
> > to stop).
>
> Sigh, this came up in a discussion with the people trying to extend cpio
> format
> to add xattrs to initramfs a year or so back. Most recently probably circa:
>
> http://lkml.iu.edu/hypermail/linux/kernel/1905.2/01956.html
>
> What the TRAILER!!! entry _actually_ means is "flush hardlink context".
> Search
> for "handling of hardlinks" in:
>
> https://www.kernel.org/doc/Documentation/early-userspace/buffer-format.txt
>
> (I have a todo entry for this, but haven't cycled back around to it...)
>
> > (The error message change in x8u is for the usual "it's harder to debug
> > if two different failure cases in the code output the exact same error
> > message", in this case "bad header".)
>
> Could you read the linux doc thing and confirm that the behavior you want
> is
> still to stop at TRAILER instead of flushing hardlink context but otherwise
> continuing to extract like the kernel guys documented for initramfs? (Or
> am I
> misremembering? It's been a while...)
>

in the thread you linked to, they say "I wonder how existing GNU or BSD
cpio ... would deal with reading such a file". all i'm saying is "GNU cpio
exits on the next record boundary, and people have scripts that rely on
this".

the Linux docs say things like

  The cpio "TRAILER!!!" entry (cpio end-of-archive) is optional, but is
  not ignored; see "handling of hard links" below.

but that doesn't match what actual implementations of cpio do. (assuming
you don't interpret optional as meaning "you don't have to have one, but if
you don't, the tool will exit with an error complaining that you don't have
one" :-) )

i think the most interesting thing for me in the docs was:

  When a "TRAILER!!!" end-of-archive marker is seen, the tuple buffer is
  reset.  This permits archives which are generated independently to be
  concatenated.

because -- even if i haven't really understood _why_ people are
concatenating cpio files -- at least this shows that the main
consumers/producers agree that this is an expected use case.

i'm assuming the "exit when you see TRAILER!!! and let the next cpio
instance worry about the rest" behavior is just the least-effort
implementation of the hard-link flush stuff:

  To combine file data from different sources (without having to
  regenerate the (c_maj,c_min,c_ino) fields), therefore, either one of
  the following techniques can be used:

  a) Separate the different file data sources with a "TRAILER!!!"
     end-of-archive marker, or

exiting when you see TRAILER!!! implicitly loses any cpio state, and
reporting an error if you hit EOF without seeing TRAILER!!! lets you know
when to stop running a new cpio?

(i think the doc is trying to distinguish between a cpio file [where
TRAILER!!! marks the end] and an "initramfs buffer" which can contain
multiple concatenated cpio files [and hence more than one TRAILER!!!]. so
things processing initramfs buffers need to be cleverer than cpio when it
comes to TRAILER!!!, but cpio doesn't. [and in practice, isn't.])

i think that answers your question, but perhaps in excessive detail, so
i'll re-quote you and try again:

> confirm that the behavior you want is
> still to stop at TRAILER instead of flushing hardlink context but
otherwise
> continuing to extract

i agree that based on the Linux docs it would be more sensible to flush but
continue, but that's demonstrably not what GNU cpio does, so it doesn't
seem particularly helpful for us to do it. callers already have to have the
bash while loop nonsense, and implementing the better behavior in toybox
would still be "broken" from that perspective because they'd loop forever
--- toybox would at least have to consider the empty input as an error, at
which point we haven't really reduced the ugliness much? (i'm also scared
to suggest anything beyond "do what GNU does" because i don't personally
know anything about cpio, and have never used it except to generate minimal
repro cases for stuff that kernel folks bring up. i haven't looked at BSD,
but they seem to interpret TRAILER!!! as end of archive too:
https://www.freebsd.org/cgi/man.cgi?query=cpio&sektion=5 ... and eighthly,
carrying on past TRAILER!!! when no-one else does sounds like one of those
security issues Android had back in the "zip master key" days; even if the
format is stupid, it's safer when everyone interprets the format the same
way... who knows what crap people are accidentally/deliberately ignoring
past a TRAILER!!! that isn't actually at the end [because they _don't_ have
the bash while loop]? i'd prefer not to find out :-) )

hmm. my second attempt seems to have more words than my first. i'll stop
here.

(i noticed as well that everyone seems to actually deal in _compressed_
cpio files, so in an ideal world i suspect cpio should be as intelligent as
tar when it comes to such things --- but i think cpio'ing is too niche to
warrant doing anything better than GNU.)

Rob
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20210414/4a20aeb4/attachment-0001.htm>


More information about the Toybox mailing list