[Toybox] [cpio] Various questions (mtime, inode number)

Rob Landley rob at landley.net
Wed May 12 07:40:05 PDT 2021


On 4/22/21 8:06 AM, Yi-yo Chiang via Toybox wrote:
> Was playing with the new cpio command and spotted a few oddities. Some of which
> I'm not sure are bugs or WAI?

This got caught in gmail's spam filter, just fished it out. Is it still relevant?

> 1. cpio -i might not preserve mtime, due to later entries might modify previous
> entries' mtime.
> 
> $ mkdir a && touch a/b
> $ touch -d @0 a a/b
> $ ls -al a
> total 8
> drwxr-xr-x 2 yochiang primarygroup 4096 Jan  1  1970 .
> drwxr-xr-x 4 yochiang primarygroup 4096 Apr 22 20:35 ..
> -rw-r--r-- 1 yochiang primarygroup    0 Jan  1  1970 b

If they're modifying it, then they're changing the mtime, yes. Tar saves
directory time modifications and applies them later to mitigate this:

https://github.com/landley/toybox/blob/0.8.4/toys/posix/tar.c#L417

> $ # both a/ and a/b have timestamp at epoch+0
> $ find a | toybox cpio -H newc -o >a.cpio
> $ mkdir stage && cd stage
> $ toybox cpio -i <../a.cpio
> $ ls -al a
> total 8
> drwxr-xr-x 2 yochiang primarygroup 4096 Apr 22 20:37 .
> drwxr-xr-x 3 yochiang primarygroup 4096 Apr 22 20:37 ..
> -rw-r--r-- 1 yochiang primarygroup    0 Jan  1  1970 b
> 
> The timestamp of a/b is correct, but a/ isn't. This is because a/ 's timestamp
> was updated when we create a/b.

Exactly.

> Not sure if this is a design choice to simplify code?

What does the kernel extractor do?

> Fixing this could mean we need a "fix-up" phase after all entries are extracted
> and fix up all the extracted file's st_mtime, which means we would memorize the
> list of all files we extract, which doesn't sound like a good idea in terms of
> memory consumption?

Just directories, and you can make simplifying assumptions about all the files
in a directory coming right after that directory so you have a single stack
you're going down and then you pop your way back up.

I did this for tar. I didn't bother for cpio because nobody'd asked.

> 2. Archives created by cpio command are non-deterministic due to unstable inode
> numbers.
> 
> $ # using the same a.cpio from previous example
> $ toybox cpio -idu <../a.cpio
> $ find a | toybox cpio -H newc -o | sha1sum
> d17aa2355dc17239b90cae724d74d6a56bef67c3  -
> $ rm -rf ./*
> $ toybox cpio -idu <../a.cpio
> $ find a | toybox cpio -H newc -o | sha1sum
> bf1428382bdb9240fedb38c46746a30d25ae4daa  -
> 
> Even though the source files are exactly the same, the produced archives have
> different contents. Upon close inspection the diff happens in the st_ino and
> st_mtime field.
> 
> How about we add an option, say "-s" for "stable" or "-P" for "Portability",
> that changes the output to have deterministic output by renumbering st_ino,
> st_mtime, st_dev and such?

Easy enough to do, but I haven't even implemented hardlink support yet. (This
stuff still isn't my day job, and I generally spin off todo items faster than I
get to them. I still haven't addressed the test suite pathing issue from
http://landley.net/notes-2021.html#30-04-2021 for example...)

I can add it to the todo heap, but I'm currently distracted elsewhre. I recently
noticed that bash job control keeps a persistent exited PID result list forever
(it's cleared by a call to "wait" with no arguments, but not by anything ELSE
I've noticed yet):

  $ exit 37 &
  [1] 16876
  $ for i in $(seq 1 100); do exit $i & done
  ...
  $ wait 16876 ; echo $?
  37
  $ wait 16900; echo $?
  bash: wait: pid 16900 is not a child of this shell
  127
  landley at driftwood:~/toybox/toybox$ wait 16876 ; echo $?
  37

AND that "set -b" (notify of job termination immediately) exists, which means
the job control plumbing has to be SIGCHLD based and thus the tables being
updated have to be accessed in a signal safe manner DESPITE being dynamically
resizable, which means the job control plumbing I've implemented so far has to
be redesigned. (I actually noticed this yesterday but was busy with $DAYJOB
stuff and just got back to it, and haven't finished the redesign yet. I think I
need two different (volatile *) to make this work...)

Rob



More information about the Toybox mailing list