[Toybox] tar --null

Rob Landley rob at landley.net
Tue Jul 19 06:41:44 PDT 2022


On 7/18/22 18:55, enh wrote:> On Mon, Jul 18, 2022 at 9:02 AM Rob Landley
<rob at landley.net
>     >     and in file.c:
>     >
>     >     + * TODO: XZ, JPEG size, dpkg.deb, rpm, mp3, odt, mp4, iso
>     >     + * MBR boot sector (partition X: startsector %d, %d sectors;)
>     >     + * word (.docx: Word 2007+), excel
>     >
>     > you shouldn't do those yourself --- you should make each of those a
>     separate bug
>     > on github with a "help wanted" or "starter project" label, and then next time
>     > you have someone asking "hey, is there something i can look at?", you have
>     stuff
>     > ready and waiting...
> 
>     Good suggestion, but I'm never sure what actually _is_ easy. I shelved this
>     after doing about half of mp3 identification, which turns out to be a
>     surprisingly large rathole due to funky container formats. (And I don't trust
>     anything microsoft's ever touched not to be turing-complete to solve...)
>
> heh, i know exactly what you mean because (a) i have this problem all the time
> at work, where people don't finish their "starter project" for years and (b)
> your specific jpeg size example was one _i_ punted when i originally submitted
> jpeg support because it turned out to be non-obvious.

The problem with "leaving easy stuff for other people to do" is they don't do
it. I submitted a series of updated patches to make the kernel's
CONFIG_DEVTMPFS_MOUNT work for initramfs and not just the fallback root= mount,
and nobody else ever picked that up and put it in.

https://lkml.org/lkml/2017/9/13/651

That seems quite easy, no? Here it is again 3 years later...

https://lkml.iu.edu/hypermail/linux/kernel/2005.1/09399.html

There's a type of salesmanship in getting Huck Finn's friends to paint his fence
which is a completely different skillset from doing the work yourself. It's a
quite useful skill I do not have.

> i still think this is the "least worst" option though, and that's actually one
> reason why i suggested a separate bug for each: it lets people thrash about a
> bit until they find one that _is_ easy (for them).

People have been trying to get me to do more with bug trackers for a larger
number of years than I like to think about.

There's a whole lot of years of unmedicated ADHD tangled up in there. I use them
when somebody ELSE manages them and regularly reviews what's been sitting there
composting. My self-managed workflow is make todo lists that work like new
year's resolutions, then chase the shiny thing on a tangent from a tangent from
a tangent until it's time to panic about externally imposed deadlines and Close
All The Tabs.

>     > (not that you can 100% trust me not to do some of those when i've had a week
>     > when i didn't get to write even a line of code and i'm looking for
>     something to
>     > do. but i'm trying to _stop_ doing all the easy little pieces myself at
>     work for
>     > similar reasons!)
>     >  
>     >     Trying to close tabs for a release. :)
> 
>     And of course I symmetrically added -a to nsenter and unshare before noticing
>     that debian only has -a in nsenter and not unshare. I also don't know why
>     nsenter has -S and -G but unshare doesn't? It seems like "create new container"
>     and "insert process into existing container" are almost the same problem
>     space...?
> 
> (that seems reasonable to me unless proven otherwise.)

Yeah, but I should sync up with Denys periodically about whether busybox wants
any of the new stuff. It's on the todo list...
 
>     >     Yeah it's an N^2 search algorithm but what's the biggest hunk you've
>     ever seen,
>     >     200 lines? 1000? Modern hardware doing N^2 search over 1000 lines
>     isn't going to
>     >     break stride. The INPUT FILE size doesn't matter, except as a
>     theoretical bound
>     >     on the upper size of the hunk if you diff two completely unrelated
>     files, but
>     >     optimizing for that case seems silly?
>     >
>     > aye, though -- like you -- i assume that's the kind of pathological case they
>     > were thinking of.
> 
>     A) so where's the test case?
> 
> tests?! did you ever look at any of the bell labs boys' stuff? :-)

Yes, quite a lot actually. (Computer history hobby!) There's a lot of
survivorship bias in there with what got published and retained 50 years later.
(All those old 1960s cars lasted so much longer than the stuff we have today. I
know because every time I see a surviving 1960s car it's lasted until now.)

But I'm also wondering about where the line's moved between a shared PDP-11's
definition of "computationally hard" and modern hardware.

And ALSO:

$ seq 1 100000 > one; seq 1 4 100000 > two; time diff -u one two > /dev/null
real	0m0.051s
user	0m0.039s
sys	0m0.012s
$ seq 1 100000 > one; seq 1 4 100000 > two; time toybox diff -u one two > /dev/null
real	0m0.320s
user	0m0.148s
sys	0m0.172s

toys/pending/diff.c runs at 1/6 the speed of debian's. I'm not sure whatever
optimization it THINKS it's doing is buying us anything?

>      <https://cs.android.com/android/kernel/superproject/+/common-android-mainline:build/kernel/build.sh;l=964
>     <https://cs.android.com/android/kernel/superproject/+/common-android-mainline:build/kernel/build.sh;l=964>>>>
> 
>     I have no idea why your email system does this.
> 
> and sadly, it's not clever enough for me to say "plain text for mailing lists,
> html for everything else". (or even just "plain text for any thread i _start_,
> but respect whatever's already in use for any thread i _reply_ to".)

Still beats what gmail's doing...
 
>     >     But now that I've gone "well here's the 80/20 solution to handling
>     mode shifts",
>     >     I'm tempted to code that up instead. Lemme see if I get to it this
>     weekend, if
>     >     not I owe you this applied before monday.
>     >
>     > sgtm. i've been trying to stop committing things on fridays, so monday's the
>     > earliest i'd be giving the kernel folks a new prebuilt anyway :-)
> 
>     Didn't get it done over the weekend. Reeducating myself on args plumbing corner
>     cases instead...
>  
> ack. i tried to take an update but hit another -Werror=format-security issue

Sigh:

  char *reset = 0;

  if (stuff) {
    reset = "\e[0m";
  }
  if (reset) printf(reset);

The problem is if I'm testing with gcc's false positive generator and forget to
test with llvm's false positive generator, it still may not catch all the same
false positives.

My objection to ASAN is I'm not yet convinced it ISN'T a false positive
generator, although I should give it a closer look. (My first encounter with it
being commit 472599b99bec is a contributing factor here.)

> with one of your diff.c changes. i've sent a patch (and a separate patch to add
> that -Werror= to the default toybox configure, since that's one we always have
> to fix in the end anyway; may as well catch them fresh?).

I agree I should hit the false positives before you hit the false positives.

I need something like a ./testy.sh script that builds with the NDK (ASAN
enabled) and runs the test suite... which involves getting the test suite to
pass when built with the NDK. Working on it, I'll try to go faster and see if I
can reshuffle the priorities a bit. I have been accused of trying to boil the
ocean on more than one occasion...

As for fixing diff: sadly my cleanups so far have broken it in more than one way
(there's the object lifetime thing and the logic to figure out what to actually
compare when given different kinds of source/target pairs, although it wasn't
entirely right before) and I stopped with yet another large cleanup
half-finished in a directory going A) I need more tests, B) I'm gonna try to
just write a SIMPLE one I understand and see how bad it is.

Digging through this diff code has been a learning experience, but you guys are
already using this meaning you need to go from something that works to something
else that works...

> i'll try again tomorrow... (i want to try to use `timeout -i` too!)

I switched the printf() to xputsn(), and fixed up the off by one error causing
the segfault. (Adding a quote increments the start, only decrement on return
when we added that quote, otherwise it's both wrong and an unaligned pointer
that's not to the start of an allocation.)

That fixes the immediate issues, but I still do not currently consider diff.c to
be load bearing. (Then again it wasn't really before, and probably isn't worse
for your use cases so...)

Rob



More information about the Toybox mailing list