[Toybox] tar --null

Mon Jul 18 09:09:19 PDT 2022

On 7/15/22 21:19, enh wrote:
> On Fri, Jul 15, 2022 at 9:34 AM Rob Landley <rob at landley.net
> <mailto:rob at landley.net>> wrote:
> 
>     On 7/14/22 18:53, enh wrote:
>     > On Wed, Jul 13, 2022 at 11:58 PM Rob Landley <rob at landley.net
>     <mailto:rob at landley.net>
>     > <mailto:rob at landley.net <mailto:rob at landley.net>>> wrote:
>     >
>     >     On 7/12/22 19:13, enh via Toybox wrote:
>     >     > so.. --transform works (though it confused people that it's not in
>     the --help
...
>     Yeah but August 6 is 3 months from the previous release and I'd like to do that
>     on a more regular schedule (modulo maybe slipping a bit to sync up with kernel
>     releases for mkroot), meaning I want to finish this properly soonish. :)
> 
>     I have a half dozen open cans of worms right now... dd, sh, mkroot walkthrough,
>     diff, tar --transform, a redo of lib/passwd.c and everything depending on it,
>     and in file.c:
> 
>     + * TODO: XZ, JPEG size, dpkg.deb, rpm, mp3, odt, mp4, iso
>     + * MBR boot sector (partition X: startsector %d, %d sectors;)
>     + * word (.docx: Word 2007+), excel
> 
> you shouldn't do those yourself --- you should make each of those a separate bug
> on github with a "help wanted" or "starter project" label, and then next time
> you have someone asking "hey, is there something i can look at?", you have stuff
> ready and waiting...

Good suggestion, but I'm never sure what actually _is_ easy. I shelved this
after doing about half of mp3 identification, which turns out to be a
surprisingly large rathole due to funky container formats. (And I don't trust
anything microsoft's ever touched not to be turing-complete to solve...)

> (not that you can 100% trust me not to do some of those when i've had a week
> when i didn't get to write even a line of code and i'm looking for something to
> do. but i'm trying to _stop_ doing all the easy little pieces myself at work for
> similar reasons!)
>  
>     Trying to close tabs for a release. :)

And of course I symmetrically added -a to nsenter and unshare before noticing
that debian only has -a in nsenter and not unshare. I also don't know why
nsenter has -S and -G but unshare doesn't? It seems like "create new container"
and "insert process into existing container" are almost the same problem space...?

>     Stream forward until you hit a diff, and then accumulate lines from each file
>     one at a time scanning BACKWARDS in the other file to find matching lines (where
>     does new last line of file 2 match in the list-since-difference of file1), and
>     when you find -U *2 lines of match you've ended the hunk. Flush what you've seen
>     (keeping the usual three lines of starting context) and move forward again as
>     matched. This usually leaves unconsumed lines in the other file (sometimes ALL
>     of what we've loaded from one file is unconsumed, that happens when you add or
>     remove a single line in isolation for example) but you just need to feed those
>     back in as "new" lines to the search algorithm...
> 
>     Yeah it's an N^2 search algorithm but what's the biggest hunk you've ever seen,
>     200 lines? 1000? Modern hardware doing N^2 search over 1000 lines isn't going to
>     break stride. The INPUT FILE size doesn't matter, except as a theoretical bound
>     on the upper size of the hunk if you diff two completely unrelated files, but
>     optimizing for that case seems silly?
> 
> aye, though -- like you -- i assume that's the kind of pathological case they
> were thinking of.

A) so where's the test case?

B) McIlroy's paper was published in 1976. which is theoretically 30 iterations
of Moore's Law ago, implying we can literally handle a billion times as much
corner case processing as they could.

> (because although it never happens "for real", it happens
> interactively, and that's probably when people are most sensitive to speed.) i
> don't remember seeing a single hunk more than tens of lines (except the other
> pathological case of "new file"). 

If people can send me a test case, I'm happy to fix it?

In theory the improved search the paper described is just a subset of the N^2
search that abandons attempts faster to find a non-optimal solution quickly.
They're just doing it over the whole file instead of a current potential hunk...

>     >     > but in the meantime
>     >     > the kernel build script now uses --null with
>     >     >
>     >   
>      -T: https://cs.android.com/android/kernel/superproject/+/common-android-mainline:build/kernel/build.sh;l=964
>     <https://cs.android.com/android/kernel/superproject/+/common-android-mainline:build/kernel/build.sh;l=964>
>     >   
>      <https://cs.android.com/android/kernel/superproject/+/common-android-mainline:build/kernel/build.sh;l=964
>     <https://cs.android.com/android/kernel/superproject/+/common-android-mainline:build/kernel/build.sh;l=964>>
>     >     >
>     >   
>      <https://cs.android.com/android/kernel/superproject/+/common-android-mainline:build/kernel/build.sh;l=964
>     <https://cs.android.com/android/kernel/superproject/+/common-android-mainline:build/kernel/build.sh;l=964>
>     >   
>      <https://cs.android.com/android/kernel/superproject/+/common-android-mainline:build/kernel/build.sh;l=964
>     <https://cs.android.com/android/kernel/superproject/+/common-android-mainline:build/kernel/build.sh;l=964>>>

I have no idea why your email system does this.

>     But now that I've gone "well here's the 80/20 solution to handling mode shifts",
>     I'm tempted to code that up instead. Lemme see if I get to it this weekend, if
>     not I owe you this applied before monday.
> 
> sgtm. i've been trying to stop committing things on fridays, so monday's the
> earliest i'd be giving the kernel folks a new prebuilt anyway :-)

Didn't get it done over the weekend. Reeducating myself on args plumbing corner
cases instead...

Rob