[Toybox] [PATCH] grep: add --line-buffered and fix regular buffering.
Rob Landley
rob at landley.net
Mon Feb 25 12:01:22 PST 2019
On 2/25/19 11:29 AM, enh wrote:
> To be honest I was calling xflush() from so many places because it was an easy
> way to get the if (ferror) perror_exit("write"); Possibly what I want is an
> xferror() that xflush() can call...
>
>
> where would you want to call xferror that xflush wouldn't be appropriate?
Anywhere you want to stop a command after "head" ate the first screen of data
and closed the pipe, or similar. (cmp <(blah) <(blah), yes | anything, etc.)
(It's basically another facet of the sigpipe problem.)
> (fread
> is the main place one needs ferror normally, and toybox has remarkably few calls
> to fread, neither of which seem to actually need to care.)
The only _input_ advantage of FILE * is really for getline(). Everything else
it's just as easy to read input blocks and chop 'em up myself.
(The problem with getline() is you don't know where the terminators are ahead of
time so you never know how _much_ to read, so you either read a byte at a time
which is _painfully_ slow or you have leftover data you need to keep around
because zcat | thingy isn't seekable input.)
Write collation's generally fun, and you can sing "nagle" to the dreidl song,
but then we spend all our time arguing about flushing. :)
P.S. I'm _so_ glad dprintf() made it into posix-2008. Pity there isn't a
dscanf() but that gets back to getline on unbufferd filehandles being hard.
> Sigh, we should probably make the helpful ones explicit and remove the rest.
>
> sgtm.
>
> looking at all the callers, xputs isn't used much at all (21 calls in toys/) and
> most of them seem dubious. xprintf is a lot more popular (338), but -- though
> it's harder to tell because there are so many -- nothing particularly
> convincingly in need of a flush stood out.
Before you go _too_ far down that path...
What I propose is having xprintf() and xputc() and friends still do the "check
for error and xexit()", but _not_ do the flush. Call xflush() explicitly when
you need a flush.
> i'm assuming this will be like FLAG, where we'll do it as we're touching things
> for other reasons?
>
> I'll add a cleaup pass to the todo heap...
>
> (you should probably keep that list checked in, or even stored as issues in the
> github bug tracker.)
It does not make any sense to anyone other than me, and is not nearly as
organized as you think. Attached is my "top of the heap" file. Which is only one
of the many "not checked in" files at the top of my working toybox tree:
$ git status | grep -v / | grep -v '[.]sw' | wc
77 116 1165
Which is not one of the entries in the "todo" subdirectory there:
$ ls todo
19.patch howto.txt ootree.patch todo2.txt
bc.lib iconv.txt patches todo3.txt
blah ifconfig.txt patch.patch todo4.txt
blah2 lib.patch pending.patch todo.android
config2help.patch lsofdiff.patch projects.txt todo.small
date.patch ltrace.sh pscomments.patch todo.txt
dest mdev2.patch ps.txt tofix.txt
explicit.patch mdev.patch release.txt torelease.txt
explorer.patch mdev.txt sub toysh.c
file2.diff meep.patch sub2 wc2.patch
file.diff needed.txt sub3 wc.patch
getconf2.c netcat.patch temp.patch zcat.txt
getconf.c net.diff test.txt
getconf.sh nettest test_xargs.diff
gzip.txt newsh.c this is a longish name
Which is not one of the 34 changed files "git diff" shows with notes to self like:
--- a/toys/other/losetup.c
+++ b/toys/other/losetup.c
@@ -4,7 +4,7 @@
*
* No standard. (Sigh.)
-USE_LOSETUP(NEWTOY(losetup, ">2S(sizelimit)#s(show)ro#j:fdca[!afj]", TOYFLAG_SB
+USE_LOSETUP(NEWTOY(losetup, ">2S(sizelimit)#s(show)ro#j:fdcaA[!afj]", TOYFLAG_S
config LOSETUP
bool "losetup"
@@ -29,6 +29,7 @@ config LOSETUP
-o Start association at OFFSET into FILE
-r Read only
-S Limit SIZE of loopback association (alias --sizelimit)
+ -A Auto-detach device when unmounted
*/
#define FOR_losetup
Which is a reminder to me to use
https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=96c5865559ce
and then probably have mount.c take advantage of that (except "object lifetime
rules" is my go-to thing to harp on in software designs and changing them like
that requires reinspection of assumptions)...
Moving up one level from there, the "toybox" directory that my actual toybox git
repos (plural) are in has 82 files/directories in it. Some of which ar test
files, like thr.c which has:
#include <pthread.h>
#include <stdio.h>
void *spin(void *data)
{
unsigned i;
for (i = 0; i<4000000000; i++);
return 0;
}
int main(int argc, char *argv[])
{
pthread_t tt[4];
void *res;
int i;
for (i=0; i<4; i++) pthread_create(tt+i, 0, spin, 0);
for (i=0; i<4; i++) pthread_join(tt[i], &res);
return ;
}
Which we were talking about a week or so back, about making top -H get
per-thread CPU right. (The test file reminds me of the todo item.)
And of course there's browser tabs:
https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system#File_Allocation_Table
https://rsync.samba.org/how-rsync-works.html
https://en.wikipedia.org/wiki/Karatsuba_algorithm
http://cgit.openembedded.org/meta-openembedded/tree/meta-oe/recipes-core/toybox
(Sooo many browser tabs.)
And buckets of old emails with yet more, todo items left in them, ala:
http://lists.landley.net/pipermail/toybox-landley.net/2019-February/010196.html
And open console tabs with things in them, most recently this grep output:
toys/posix/nl.c: xprintf("%s", line);
toys/posix/ps.c: printf("%s", TT.pgrep.d ? TT.pgrep.d : "\n");
toys/posix/strings.c: printf("%s", string);
toys/posix/ulimit.c: printf("%s", toybuf);
Which reminds me "oh right, do a cleanup pass on the tree for the %s stuff we
were talking about last email"...
And another tab has open a 250 line "podcast.txt" file (not from any of the
above directories) I've written trying to outline a walkthrough of the toybox
code reminding me of concepts I need to remember to try to explain (here, I'll
attach that too, of course it's _also_ unfinished and incoherent and means
nothing to anyone but me)...
It's not as simple as "check it in". As with things like "test suite" and
"documentation", there's a huge amount of cycles needed just to _curate_ it and
process this compost heap into usable work product...
As I said, this mess tends to be a symptom of "not enough time to clear backlog"
so even little things accumulate. Heck, I've got a dozen or so half-composed
email reply windows open just like this one...
Rob
-------------- next part --------------
yocto toybox patch
yocto toybox build
CELF proposals?
rename xparsemillitime->xparsems see if there are more users?
find multiple {} and environ_bytes()? (find -exec +)
Review Hulbert DM, get reference(s), write up proposal.
samsung, sony, who is Lipi Lee?
JCI? (Ha!)
jci: bash 3.2.57, fdisk, resize2fs, rsyslog, screen, oprofile, parted, wget
sntp test
server on 10.0.2.15
TRAP reset time from 10.0.2.2 (host)
rdate.c
make test_date swiss cheese. (See also recent commit.)
post about date testing
date --date=1:2:3 vs date --date=98-7-6 zero or current?
toys/other/mkdosfs.c
nbd_server.c
rfc 3164: syslog (pending/syslogd.c)
- rfc 5424 (new 3164), 5425 (tls), 5426 (udp), RELP, systemd journal,
buffer output if receiver not ready
https://www.rsyslog.com/doc/relp.html (and wikipedia page)
https://en.wikipedia.org/wiki/Rsyslog
anything else xrecvwait()?
top -H not grouping threads right, thr.c
go through top man page, any more options? (-f ?)
htop bars, colors?
what's eating all the CPU time? Faster?
bc.c cleanup
ps ax (vs ps -x)
patch fuzz
route.c redo rtnetlink, android commit, github pull
echo -e '\033[?7h'
./qemu-i686.sh -nic user,hostfwd=:127.0.0.1:12345-:22
iotop swapin field
netcat only one codepath (see commit, proposed commit)
dhcp/zcip = same file, automatic fallback option
yocto upstreamery:
http://cgit.openembedded.org/meta-openembedded/tree/meta-oe/recipes-core/toybox
Eduardas Meile <eduardas.m at fods.com>
require toybox.inc
SRCREV = "123456789abcdef"
buildroot toybox patch
RFC writeup for class E and multicast
grub qemu install, partition toggle (hda partition/format loopback script?)
sntp -M server
that code review
Submit ping range patch to kernel again
https://twitter.com/b0rk/status/1094297731546386437
* Note: ping_group_range should never have existed. To disable it, do:
* echo 0 $(((1<<31)-1)) > /proc/sys/net/ipv4/ping_group_range
* (Android does this by default in its init script.)
Submit initramfs devtmpfs mount patch to kernel again
patch fuzz factor, git rename/copy/delete support
mkfs.vfat, genvfatfs, mtools
hexdump
* -n# -s# -C -e -f FILE -v
# mkroot build not noticing 118 commits in --version? (git describe --tags)
# - because git wasn't in $PATH
watch -> less -> edit
Text editor:
nano
joe
vi
emacs
nut: dhcpd, tftpd
deflate, zip
rsync
screen
htop
sntp server, receiver
android/core
toolbox: grep, getevent, r, getprop.cpp
cpio
logcat
run-as - sudo
BroadcastSender.c BroadcastReceier.c
collate --color isatty() logic. (always/never/auto)
For release:
test.c promote (PR 47 100 102)
http://lists.landley.net/pipermail/toybox-landley.net/2018-September/009666.html
watch.c promote
commit scripts/genconfig.sh (prlimit fix)
lowhang:
sntpd
httpd, wget
tftp, tftpd
route
strace sudo flex gzip htop mtd ntp ar nfsmount smbmount
nbd-server tcpdump gzip zip arp arping ftpput ftpd
tar expr
sh-history vi microemacs joe screen
deflate RFC printout
arp rfc printout
dhcp printout (rfc 3927)
# follow argv[0] symlinks until one's recognized
- todo: toybox ./name shouldn't follow symlink, only top level? Hmmm...
cleanup environ_bytes()
make xparsetime() return ms like millitime()
status.html:
uncategorized: crc32 fmt uuidgen dhcp6 ipaddr iplink iproute iprule iptunnel toysh -sh -toysh traceroute6
Add toybox to buildroot
- with gazillion CONFIG things (search for busybox)
fix ulimit
rm infinite descent
migrate sed and patch to loopfile_replacelines?
loopfile_lines() with -i behavior?
nfsmount:
lkml.iu.edu/hypermail/linux/kernel/1606.1/01115.html
netcat logger
ratelimit
git clone https://lore.kernel.org/lkml/0
find x = x "may be unused" assignments:
grep '[^a-zA-Z0-9]\(..*\) = \1[,; ]' toys/*/*.c
-Wmaybe-uninitialized not supported by llvm
sudo netstat -ltnp
mount | column -t
column -t -s:
multitail?
cd - # OLDPWD not set
sudo !! # run last command as root
> what i miss more is not having a good way to check for expected
> errors, especially given that we're making little/no effort to match
> error messages. would be good to have some kind of "i expect command c
> to fail with exit code x and stderr output that matches regex r"
> utility.
conference writeup: if I had a million dollars
Going deeper on that topic, here's David Wheeler's 009 dissertation:
Countering "trusting trust" through diverse double compiling.
https://dwheeler.com/trusting-trust/
That's _why_ reproducing builds from source is so important, and how it's just the _start_ of proper analysis.
Rob
P.S. If we had an unlimited budget I'd hire a couple recent graduates from a women's technical college to glue qemu's tcg to tinycc so we had a third compiler that could target superh, and then set them to making it reproduce https://bellard.org/tcc/tccboot.html with a current kernel+toybox+musl. And turn it into a multicall binary so ld/strip/nm/objdump aliases worked and it could replace binutils as well as cc...
-------------- next part --------------
Toybox!
Simple build and use
make defconfig && make && make PREFIX=/chroot install
./toybox, ./toybox ls -l, ln -s toybox ls && ./ls -l
mkdir /mybin && cp toybox /mybin &&
for in $(bin/toybox); do ln -s /mybin/$i toybox; done
then export $PATH
CROSS_COMPILE=prefix LDFLAGS=--static
make sed, make change
make help install_flat
./toybox --help
./toybox --help command, ./toybox help command, ./toybox command --help
help vs man (shell builtins vs system $PATH)
Why ./toybox has no paths: shell scripts!
for i in $(./toybox); ln -s toybox $i; done
for i in $(./toybox --list); ln -s /bin/toybox $i; done
for i in $(./toybox --list); cp -s toybox $i; done)
Writing a new command
- The simplest command is "false.c", copy it to a new name
- Looking at skeleton.c in examples
Toybox tricks
Building it:
configure; make; make install
- we use "make menuconfig", defconfig is maximum sane, also "make sed"
- see "make help"
Modifying it:
Adding a new command:
Add a command file under toys/dirname, it picks it up automatically.
sed 's/false/boom/;s/FALSE/BOOM/' toys/*/false.c > toys/pending/boom.c
make distclean; make defconfig; make boom
Starting from toys/examples/hello.c or skeleton.c for more plumbing.
Adding a new category (toys/mycompany) just needs a README, first line
used as kconfig description (rest ignored). Note: flat namespace.
The smallest/simplest command is false.c:
- Starts with multiline /* comment */
- Lines with leading asterisk lines are comment, no function.
USE_NAME(NEWTOY(cmdname, options, flags))
kconfig entry for CMDNAME
#include "toys.h"
void cmdname_main(void)
hello.c adds standards URLs and GLOBALS()
#define FOR_which before toys.h to get FLAG_ macros and TT
In the comment at top, conventions are:
one line "at a glance" summary of what command's for
Copyright (who to blame, how old is it, saves you a git log)
Relevant standard(s)
"Deviations" section at the end if we depart from those standards.
kconfig
Starts with "config" (at left edge) and continues until */ line, so
must be at end of starting comment block.
Same as kernel, buildroot, u-boot, etc. (Circa Linux 2.6.12 anyway.)
"make defconfig/menuconfig" reads Config.in, produces .config".
- see miniconfig
Then script/*.sh reads .config to produce generated/config.h.
Top level Config.in #includes generated/Config.in collecting each
command's kconfig stanzas
generated/config.h created #defining CFG_XXX and USE_XXX() macros
#instead of #define CONFIG_BLAH we #define CFG_BLAH to 1 or 0.
if (BLAH) becomes if (1 or 0) then function-sections+gc-sections
USE_BLAH(optional contents)
No #ifdefs in code! (Well ALMOST never, uname.)
defconfig is "maximum sane config", convention is commands in toys/pending
and toys/example should "default n"
NEWTOY() line, defines a command
NEWTOY(name, options, flags)
Nope, no trailing semicolon. (It's a macro, not a function.) Must start
at left edge (no leading space) to get picked up by build scripts.
- name: command name. Must have a name_main() and uppercased kconfig
"config NAME" entry.
- options: either NULL/0 or "string" in extended optargs format.
- command line options are their own section, we'll come back to this
- flags: TOYFLAG macros defined/described in lib/toyflags.h
There's a USE() macro around it to conditionally enable it. "This
command is enabled when this config option is enabled" is implemented
by that USE() macro.
generated/newtoys.h, more or less grep NEWTOY toys/*/*.c | sort
alphabetical list of commands (for binary search)
Except that if multiplexer enabled, "toybox" entry is first to
avoid one level of search each time you run a command.
USE() macros around each one, so only enabled commands visible
#included twice in main.c and once in lib/help.c
#define NEWTOY macro differently to initialize different arrays
- toy_list[], NEED_OPTIONS, help_data;
GLOBALS()
generated/globals.h
structs packed into a union, less memory
default to zero (just like normal globals, ELF spec guarantees that).
union this, ala this.commandname.value
TT #defined to yours (#define FOR_command #include "toys.h")
TT.blah accesses GLOBALS() entry
Start of GLOBALS() can be filled out by option string. Convention is
these variables have the same name as the option, with a blank line
between option arguments and remaining args.
union at start when multiple commands share same TT
option string.
Conceptually similar to getopt() but doesn't use libc function
greatly extended syntax.
lib/args.c parses it
- writes to toys.optflags, toys.optargs[], and start of GLOBALS()
option string has prefix, (longopts) shortopts, types of argdata to save
- seen flags saved in toys.optflags
- argument data saved in GLOBALS() (right to left)
treated as array of long[], LP64 says we can save pointers in that too
inconvenient for 32 bit (range limited ints).
FLOAT is sizeof(long) on this target.
- order of structure members guaranteed by C99
- convention is argument name same as option letter
So "a:" means "cmd -a BLAH" assigns "BLAH" to TT.a
int a,b,c collated on same line. (order guaranated by c99 either way)
- leftover arguments appended to toys.optargs[]
USE_XXX() treated specially by flag parsing logic:
same bit positions (FORCE_FLAGS)
same global[] slots (avoids #ifdef)
FLAG macros
toys.h #includes generated/flags.h created from .config by scripts/*.sh
- bit position from toys.optflags, counting from right
if (toys.optflags&FLAG_x) blah();
#define FOR_command before #include <toys.h> to select flags
Gearshift to new flag set:
#define CLEANUP_old #define FOR_new #include "generated/flags.h"
when USE("x") disabled, FLAG_x becomes 0, if (variable&0) constant
propagates to 0 and the code drops out at compile time.
BUT if that's not what you want (multiple commands in same file)
#define FORCE_FLAGS to keep all flag macros enabled despite .config
optflags bits in order: "abcdefgh" with -adef sets "10011100" = 0x9c
Some code bypasses flag macros, see gzip_main() and uname_main()
FLAG(x) becomes (toys.optflags & FLAG_x)
populating GLOBALS:
GLOBALS() also populated right to left (becomes top to bottom)
array of long[], array of pointer same size, so slot[3] same either way.
FLOAT is whatever floating point type same size as long.
long is 32 bit on 32 bit platforms, so "truncate -s 8g" problem. :(
other globals outside GLOBALS()
toybuf, libbuf - scratch space
struct toy_list toy_list[];
- array of enabled NEWTOY() entries: name, main(), optstr, flags
struct toy_context toys; - global variables common to most commands
which - pointer to our toy_list entry. toys.which->name is this command.
argv - original unfiltered command line
optargs - command line arguments leftover after argument parsing
optc - count of optargs
optflags - one bit per flag seen on command line (& with FLAG macros)
exitval - error code returned when we xexit() or return from main
- defaults to 0, error_msg() sets it to 1 if it's still 0
ARRAY_LEN()
nommu support: vfork, malloc, stack
main.c
lib/lib.c and lib/xwrap.c
lib/portability.[ch]
Build plumbing:
configure, make, install
configure: default values for environment variables
Config.in
kconfig: old kernel kconfig plumbing from 2.6.12, with Makefile targets
scripts/genconfig.sh
Not autoconf, configure and scripts/genconfig.sh
No makefiles, shell script.
We more or less do "cc main.c lib/*.c toys/*/*.c $CFLAGS"
- but generated/* created from toys/*/*.c header info
- filter to only include enabled toys/*/*.c commands.
(Yes we build lib/*.c wildcard.)
- ffunction-sections -gc-sections trick
- script that outputs that in generated/build.sh
- figure out which shared libraries (-lm -lz etc) we need
- generate generated/* files
- probe toolchain features: "does it have this function/header/symbol"
- lots of 'cc -E -dM - < /dev/null' symbols there by default
- the generated/ stuff:
Kconfig (old simple version)
make install, install_flat, install_airlock (scripts/install.sh)
test suite:
make tests, make test_sed
special handling of USE_BLAH() macros in optstring during build
scripts/mkflags.c
No external dependencies (optional only)
https via pipe to external program
No curses, ANSI escapes.
internal deflate implementation
internal hash functions
stacktop, vfork: !stacktop checked in multiple places
0BSD.
Zero Clause BSD, SPDX 0BSD.
OpenBSD suggested template license with half a sentence removed.
Corporate friendly public domain equivalent license.
The BSD rubber stamp (4 clause, 3 clause, 2 clause... zero clause. Derived
from OpenBSD and asked Kirk McKusick.) Big warantee disclaimer security
blanket for legal departments, all that pointless legal boilerplate for
lawyers to roll around in. Entirely ablative.
Public domain adjacent licenses. Combineable/relicensable? Well, sort of.
Busybox ping.c: put GPL at start and BSD at end to minimize obviousness of
conflict. Stuttering problem. How do you enforce that mess?
But trademark? Patent? Simple: this is a copyright license. You want a
trademark license, add one. If you want a patent license, add one. Call
the files COPYING, TRADEMARK, PATENT. Don't try to have one license cover
three categories of IP law. (Or trade secret, or contract.)
FAQ:
https://yarchive.net/comp/linux/pivot_root.html
http://lkml.iu.edu/hypermail/linux/kernel/1310.0/02823.html
More information about the Toybox
mailing list