[Toybox] awk (was: strlower() bug)

Rob Landley rob at landley.net
Sun Jun 16 06:40:46 PDT 2024


On 6/15/24 17:22, Ray Gardner wrote:
> On Wed, Jun 12, 2024 at 2:57 PM Rob Landley <rob at landley.net> wrote:
>>
>> On 6/11/24 16:56, Ray Gardner wrote:
>> > Elliot, thanks for the positive feedback on the docs, but I really
>> > wish you and Rob would try the program. I waited a while to see what
>> > Rob would have to say. He doesn't seem the sort to be at a loss for
>> > words, but ... nothing. Any idea why he's had nothing to say about an
>> > awk for toybox?
>>
>> Why are you asking Elliott?
> 
> He responded to my post; you didn't. I know you've had a lot on your plate
> lately with your move, selling the house, working on toysh. But you
> responded to most posts here since mine on 5/14.

I had the window open, but hadn't yet done the reading. :)

> After all you've written about awk, I was puzzled by your non-response,
> and inferred wrongly that it was intentional, so I thought asking you why
> you didn't respond to a post you intended to ignore would be ... not well
> received. I thought Elliot might have some insight there.

A reasonable amount of follow-up is fine. I _do_ drop the ball a lot.

>> Remember the "poke me a week later if I forget"?
> 
> No. But I dug into the archive, and find that you said that to Oliver in a
> post about toysh in March. But never about awk, or to me.
> (http://lists.landley.net/pipermail/toybox-landley.net/2024-March/030146.html)

Sigh, I keep thinking it's in the FAQ. (I should update the FAQ, but have like 8
half-finished updates to it already...)

>> I consider myself poked, somewhat passive-aggressively. :)
> 
> No passive aggression intended.

My fault for not documenting the expected procedure better.

> But you've been looking for an awk for at least 8 years, so I really
> thought you'd welcome one that's complete and written for toybox, with
> some tests and documentation.

I am very interested, yes.

I downloaded your repo, copied toybox/awk.c to toys/pending/awk, and built it.
It compiled. I grabbed awk.test and ran that, and it passed.

Didn't QUITE pass test_host:

awk: cmd. line:1: warning: escape sequence `\u' treated as plain `u'
FAIL: awk \u
echo -ne '' | "/usr/bin/awk" 'BEGIN{print "\u20\u255"}' < /dev/null
--- expected	2024-06-16 08:36:12.147722288 -0500
+++ actual	2024-06-16 08:36:12.155722288 -0500
@@ -1 +1 @@
- ɕ
+u20u255

But eh, passed all the others (with VERBSOSE=all), close enough. (Adding
"utf8locale" to the test file didn't fix it, dunno what it's trying to do...)

*shrug* I'm happy to check it into pending as is, if you don't mind discarding
commit history. (Um, URL to the github commit I got it from maybe? The trees
haven't got the same base so there isn't an obvious "pull" option here...)

  $ git log toybox/awk.c toybox_awk_test/awk.test | grep '^commit ' | wc -l
  30

It's a _bit_ granular but toysh is way worse, and for that matter:

  $ git log --follow toys/*/sed.c | grep '^commit ' | wc -l
  110

Hmmm, maybe I can do something <strike>clever</strike> fiddly with fishing out
git format-patch entries, trimming them a bit and adjusting the paths, and "git
am" in the other tree...

>> I have the tab open, the reason I haven't looked at it yet is A) it's 4500
>> lines, B) in a thing I have WAY insufficient existing domain expertise in (but
>> multiple bookmarked tutorials and an entire book on somewhere).
> 
> It's really 3523 lines of non-blank non-comment code, measured with:
> 
> toybox awk -f cnt_sloc.awk awk.c
> 
> where cnt_sloc.awk is:
> /^[ \t]*\/\*/ , /\*\/[ \t]*$/ { next } # Skip /* ... */ comments
> /^ *$/ || /^ *\/\// { next } # Skip empty and //comment-only lines
> { sloc++ }
> END { print sloc }

Hmmm...

$ ./awk -f <(echo '/^[ \t]*\/\*/ , /\*\/[ \t]*$/ { next }'$'\n''/^ *$/ || /^
*\/\// { next }'$'\n''{ sloc++ }'$'\n''END { print sloc }') toys/*/awk.c
3523

> BTW regarding not getting an SSD at Target: there's a MicroCenter in the
> Minneapolis metro area; might be worth the drive. The one where I am is good.

I took the green line to the A line to Best Buy, which still had a few of the
right kind of ssd locked in a misc old parts filing cabinet. New(er) laptop is
up and running, with non-EOL os version installed on it. (Hence the list of
things the new environment/compiler broke.)

Part of my slow/quiet here is the old machine is still the "master" for email
and blog pushing, so I write notes-to-self and then have to copy them over when
that's the one I took out with me for the day. I'm slowly getting used to
firefox (dunno if it really scales yet, "pkill -f renderer" at chrome hasn't got
an obvious equivalent that leaves the tab open and reloadable). I also have to
decide if it's gonna be Thunderbird again or something else, which is presumably
bundled with the https://landley.net/notes-2024.html#23-04-2024 migration. I
have a small backlog of git commits on the new machine too, but that tool's
explicitly designed to handle that sort of thing and even my strange use of it
falls within acceptable parameters. (It's one more branch, basically...)

Yesterday I was redoing the Linux From Scratch build as an actual proper
mkroot/packages/lfs script current with 12.1. Haven't gotten that far yet but I
think I've worked out the design I want. Modulo "building the chroot with
musl-cross-make toolchain != building it with their cross compiler", but that's
a fiddly design straddle I blogged a bit about already, and need to collate and
send. (Which is not just SORT, it's gather together and merge adjacent units. I
may not die on this hill, but I'm tanking a bit of damage to defend it until
somebody tells me what OTHER word I should be using for that. "I need to buffer
my various blog entries" ain't it. Probably "edit together", in this specific
case...)

I've also been rereading a lot of the toyshell redirection and variable
expansion code, both to fix the initramfs-without-console redirection filehandle
lifetime bug where 'echo > /dev/console' produces no output if
open(/dev/console) returned fd 0 and then it tried to redirect it to fd 0 and it
already was (which happens on initramfs statically linked into vmlinux, but NOT
on externally loaded qemu -initrd file.cpio.gz ones because
linux/init/noinitramfs.c:default_rootf() triggers for external initramfs, thanks
linux-kernel devs), and to implement "trap" properly (largeish design change:
do_source() shouldn't recurse, it should add an input source to a stack and
parsing loop in sh_main() should pull lines from the tip of the stack and
pop/close sources at EOF; that way trap is just a
do_source(fmemopen("string"))).) And today there's
https://github.com/landley/toybox/pull/506 which _may_ be a fix for the issue
but at first glance sounds like it's just reverting
https://landley.net/notes-2023.html#21-11-2023 (I have the tab open as to-read
but haven't spent focus on it yet) and the issue _I_ had there was the parsing
decisions had to be pushed down into the consumer which means my assumption that
parentheses are always balanced by the time we get to variable expansion is no
longer true, so I either need to audit my variable expansion logic OR do a
validator function (probably a simple loop around the tokenizer).

It's because:

  $ bash -c $'x(){ cat << EOF\n${potato\nEOF\n};echo one;x'
  one
  environment: ${potato
  : bad substitution

I.E. the unbalanced ${ isn't detected during line parsing (triggering a
continuation prompt), it's detected at runtime within the HERE document
processing. The function is PARSED just fine, it complains when RUN.

But today gets spent at my sister's place with the niecephews, because my father
is in town and it's his day. (Pre-printed on the calendar and everything.) There
will probably be cooked meat.

Rob


More information about the Toybox mailing list