[Toybox] patch: add built-in versions of sha-2 family hash functions

Mon Jun 7 09:14:11 PDT 2021

On Fri, Jun 4, 2021 at 10:54 PM Rob Landley <rob at landley.net> wrote:

> On 6/4/21 8:22 AM, Dan Brown wrote:
> > Hi Rob- thanks. I get the impression that I would not be able to help on
> the
> > things you're working on - they sound like they involve knowledge that
> is too
> > deep for me to learn for a side project.
>
> That's just what _I'm_ working on. The project has plenty of surface area.
> Let's
> see...
>
> If somebody wanted to write an awk, I'd thank that person profusely.
> Android
> uses kernighan's original awk:
>
>   https://en.wikipedia.org/wiki/AWK#Versions_and_implementations
>
> But it uses a bespoke bsd license variant
> (https://github.com/onetrueawk/awk/blob/master/LICENSE) and toybox uses a
> public
> domain equivalent license, so I can't take code directly from that. Plus
> I'm
> told it doesn't support utf8:
>
>   https://github.com/landley/toybox/issues/67#issuecomment-587347070
>
> I have the very start of an ar.c that doesn't do anything yet (attached).
> That's
> used not just by builds but inside of debian's package management and a few
> other places.
>
> There's a bunch of stuff in toys/pending/*.c. I got about halfway through
> mkdosfs.c before getting pulled away and never getting back to it
> (attached that
> too; turns out you MUST support all the fat12/16/32 variants because it
> uses the
> image size to determine file type. I was vaguely thinking I could make a
> genvfatfs.c that worked like an archiver (ala genvfatfs dir | gzip | ssh
> remote.sys "cat > img.gz"), and maybe even implement the whole mtools
> suite).
>
> The mke2fs.c in pending is also half-finished and I haven't touched that in
> YEARS (that I was working on for busybox, and copied what I had over into
> toybox
> but it was at the end of the todo heap, and adding ext3 was just a magic
> file in
> a reserved inode number but since then ext4 happened and I haven't had a
> chance
> to look at that at all)...
>
> For diff.c I really wanted to do a "patience diff" algorithm
> implementation, but
> got a conventional one submitted instead and never had a chance to wrap my
> head
> around the pro/con of the version they did. I bookmarked an algorithm
> comparison
> article:
>
>   https://blog.jcoglan.com/2017/09/19/the-patience-diff-algorithm/
>
> And have various old bookmarks about patience diff:
>
>   https://bramcohen.livejournal.com/37690.html
>   https://bramcohen.livejournal.com/73318.html
>   https://en.wikipedia.org/wiki/Patience_sorting
>
> I need to write a less.c (probably using watch.c as a model, and definitely
> implementing -R support so colors work). But then I need to do similar
> code for
> the shell's command line editing...
>
> If somebody who actualy _uses_ modules could review modprobe, that would be
> nice. My main blocker there is the embedded systems I put together all have
> static kernels.
>
> I've always been slightly unclear on what getty.c _does_ and why it's
> separate
> from login.c. (Is it related to stty?)
>

in case i never sent the response to the list, TL;DR "yeah"...

[i'm told that] if you're using real serial ports, you still need the baud
rate setting features. if you're using real serial cables in an
electrically noisy environment, you have another local getty patch that i
honestly haven't understood well enough to even try to work out whether it
makes any sense to upstream :-(

aiui they were sometimes seeing XOFF sent to init, causing boot to hang.
(although i understand that XOFF/XON is useful in theory, i've been
disabling it since the early 1990s because i've haven't deliberately used
it since the early 1980s when computers were still slow enough for human
reaction times to be somewhat meaningful there.)

> I want to redo route.c to use the netlink api to add multi-table support
> (ala
> https://android.googlesource.com/platform/external/toybox/+/48e1f81151f6).
>
> > After reading through your state-of-the-toybox message, the commands
> "tr" and
> > "stdbuf" are interesting to me. If you think it is worth a shot for me
> to give
> > it a go, let me know which one you'd recommend.
>
> It'd be great if somebody could tackle stdbuf, I have no idea how that one
> works. (The buffering is in libc, how does one program tell the next
> executable
> to use different libc default parameters? Export an environment variable?

yes.

> Is
> doing that portable between glibc/musl/bionic?

no, glibc-only.

(i thought we'd already argued that this means you'll have the
glibc-provided binary if you're using glibc, which means this doesn't make
sense for toybox?)

> "strings" on the stdbuf binary is
> showing _STDBUF_ and LD_PRELOAD and libstdbuf.so and that just sounds
> ugly...)
>
> My main todo item on tr is I'd like to have utf8/unicode support, but
> haven't
> figured out how to do so yet at a conceptual level because it's matching a
> source string with a destination string by position, and things like
> [:alnum:]
> expand to different numbers of entries in ascii vs unicode. We had a
> thread on
> that here:
>
>
> http://lists.landley.net/pipermail/toybox-landley.net/2020-December/012158.html
>
> Rich Felker said he had a simple way to do it, but we've never sat down to
> have
> him explain it to me.
>

(i'd be curious to hear that, because every implementation i'm aware of --
and the Unicode standard --literally end up with a huge table *and*
hard-coded special cases in the code. the closest i've come to "clever"
with this was to hoist the hard-coded special cases out and have a separate
"easy case" copy of the loop. but that's only a run-time "simplification",
and makes the implementation strictly larger.)

but more than that, i'd still like to hear an argument that trying to be
clever here makes any _sense_ :-)

i'd ask for a single real-world example where someone's actually using
this, but since BSD and GNU and Plan 9 trs don't, that doesn't exist.

(and this ignores the question of "sure, but aren't we going to harm more
ASCII-only 'kernel build' users by accidentally taking their locale into
account than we are going to help imaginary Turkish AT&T lawyers still
using the Unix command line for writing their patent applications in 2021",
to which i'm pretty sure the answer is "yes, the only net result of
implementing this would be that we'd need to tell a bunch of people to set
their locale to "C" for their builds...".)

fwiw, getting back to something you said earlier, i think *this* is where
one true awk "doesn't support utf-8" --- "convert Turkish input to
upper/lowercase" _ought_ to be something that awk can do that tr can't
(because tr is all about characters/bytes, but awk is all about strings),
but one true awk can't do it either. perl and python can. realistically i
think anyone who falls into the "no, i really do want to deal with all the
weirdness of human scripts [in the Cyrillic/Hangeul sense of the word]"
category (a) should use and (b) is already using python anyway. even the
kernel and toybox use perl or C where awk would do. "it's POSIX", sure, but
"no-one who wasn't doing this kind of thing in the 1990s has ever used it,
and those of us who were don't want to write things that only we can
maintain".

your non-POSIX cut(1) extension covers 80% of the in-the-wild use of awk
anyway :-) if you still talk to any of the busybox folks, we should suggest
they copy that --- it would be nice for it to be a de facto standard so we
can get it into POSIX sometime around the 2040s... (and have made lives
better for the folks who don't care about standards and just want to "get
things done" in the intervening decades!)

> > Dan
>
> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20210607/5420f7d4/attachment.htm>