[Toybox] [PATCH] grep: add --line-buffered and fix regular buffering.

Rob Landley rob at landley.net
Mon Feb 25 12:01:22 PST 2019


On 2/25/19 11:29 AM, enh wrote:
>     To be honest I was calling xflush() from so many places because it was an easy
>     way to get the if (ferror) perror_exit("write"); Possibly what I want is an
>     xferror() that xflush() can call...
> 
> 
> where would you want to call xferror that xflush wouldn't be appropriate?

Anywhere you want to stop a command after "head" ate the first screen of data
and closed the pipe, or similar. (cmp <(blah) <(blah), yes | anything, etc.)

(It's basically another facet of the sigpipe problem.)

> (fread
> is the main place one needs ferror normally, and toybox has remarkably few calls
> to fread, neither of which seem to actually need to care.)

The only _input_ advantage of FILE * is really for getline(). Everything else
it's just as easy to read input blocks and chop 'em up myself.

(The problem with getline() is you don't know where the terminators are ahead of
time so you never know how _much_ to read, so you either read a byte at a time
which is _painfully_ slow or you have leftover data you need to keep around
because zcat | thingy isn't seekable input.)

Write collation's generally fun, and you can sing "nagle" to the dreidl song,
but then we spend all our time arguing about flushing. :)

P.S. I'm _so_ glad dprintf() made it into posix-2008. Pity there isn't a
dscanf() but that gets back to getline on unbufferd filehandles being hard.

>     Sigh, we should probably make the helpful ones explicit and remove the rest.
> 
> sgtm.
> 
> looking at all the callers, xputs isn't used much at all (21 calls in toys/) and
> most of them seem dubious. xprintf is a lot more popular (338), but -- though
> it's harder to tell because there are so many -- nothing particularly
> convincingly in need of a flush stood out.

Before you go _too_ far down that path...

What I propose is having xprintf() and xputc() and friends still do the "check
for error and xexit()", but _not_ do the flush. Call xflush() explicitly when
you need a flush.

> i'm assuming this will be like FLAG, where we'll do it as we're touching things
> for other reasons?
>  
>     I'll add a cleaup pass to the todo heap...
> 
> (you should probably keep that list checked in, or even stored as issues in the
> github bug tracker.)

It does not make any sense to anyone other than me, and is not nearly as
organized as you think. Attached is my "top of the heap" file. Which is only one
of the many "not checked in" files at the top of my working toybox tree:

  $ git status | grep -v / | grep -v '[.]sw' | wc
       77     116    1165

Which is not one of the entries in the "todo" subdirectory there:

$ ls todo
19.patch           howto.txt       ootree.patch            todo2.txt
bc.lib             iconv.txt       patches                 todo3.txt
blah               ifconfig.txt    patch.patch             todo4.txt
blah2              lib.patch       pending.patch           todo.android
config2help.patch  lsofdiff.patch  projects.txt            todo.small
date.patch         ltrace.sh       pscomments.patch        todo.txt
dest               mdev2.patch     ps.txt                  tofix.txt
explicit.patch     mdev.patch      release.txt             torelease.txt
explorer.patch     mdev.txt        sub                     toysh.c
file2.diff         meep.patch      sub2                    wc2.patch
file.diff          needed.txt      sub3                    wc.patch
getconf2.c         netcat.patch    temp.patch              zcat.txt
getconf.c          net.diff        test.txt
getconf.sh         nettest         test_xargs.diff
gzip.txt           newsh.c         this is a longish name

Which is not one of the 34 changed files "git diff" shows with notes to self like:

--- a/toys/other/losetup.c
+++ b/toys/other/losetup.c
@@ -4,7 +4,7 @@
  *
  * No standard. (Sigh.)

-USE_LOSETUP(NEWTOY(losetup, ">2S(sizelimit)#s(show)ro#j:fdca[!afj]", TOYFLAG_SB
+USE_LOSETUP(NEWTOY(losetup, ">2S(sizelimit)#s(show)ro#j:fdcaA[!afj]", TOYFLAG_S

 config LOSETUP
   bool "losetup"
@@ -29,6 +29,7 @@ config LOSETUP
     -o Start association at OFFSET into FILE
     -r Read only
     -S Limit SIZE of loopback association (alias --sizelimit)
+    -A Auto-detach device when unmounted
 */

 #define FOR_losetup

Which is a reminder to me to use
https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=96c5865559ce
and then probably have mount.c take advantage of that (except "object lifetime
rules" is my go-to thing to harp on in software designs and changing them like
that requires reinspection of assumptions)...

Moving up one level from there, the "toybox" directory that my actual toybox git
repos (plural) are in has 82 files/directories in it. Some of which ar test
files, like thr.c which has:

#include <pthread.h>
#include <stdio.h>

void *spin(void *data)
{
  unsigned i;

  for (i = 0; i<4000000000; i++);

  return 0;
}

int main(int argc, char *argv[])
{
  pthread_t tt[4];
  void *res;
  int i;

  for (i=0; i<4; i++) pthread_create(tt+i, 0, spin, 0);
  for (i=0; i<4; i++) pthread_join(tt[i], &res);

  return ;
}

Which we were talking about a week or so back, about making top -H get
per-thread CPU right. (The test file reminds me of the todo item.)

And of course there's browser tabs:

https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system#File_Allocation_Table
https://rsync.samba.org/how-rsync-works.html
https://en.wikipedia.org/wiki/Karatsuba_algorithm
http://cgit.openembedded.org/meta-openembedded/tree/meta-oe/recipes-core/toybox

(Sooo many browser tabs.)

And buckets of old emails with yet more, todo items left in them, ala:
http://lists.landley.net/pipermail/toybox-landley.net/2019-February/010196.html

And open console tabs with things in them, most recently this grep output:
toys/posix/nl.c:    xprintf("%s", line);
toys/posix/ps.c:    printf("%s", TT.pgrep.d ? TT.pgrep.d : "\n");
toys/posix/strings.c:          printf("%s", string);
toys/posix/ulimit.c:          printf("%s", toybuf);

Which reminds me "oh right, do a cleanup pass on the tree for the %s stuff we
were talking about last email"...

And another tab has open a 250 line "podcast.txt" file (not from any of the
above directories) I've written trying to outline a walkthrough of the toybox
code reminding me of concepts I need to remember to try to explain (here, I'll
attach that too, of course it's _also_ unfinished and incoherent and means
nothing to anyone but me)...

It's not as simple as "check it in". As with things like "test suite" and
"documentation", there's a huge amount of cycles needed just to _curate_ it and
process this compost heap into usable work product...

As I said, this mess tends to be a symptom of "not enough time to clear backlog"
so even little things accumulate. Heck, I've got a dozen or so half-composed
email reply windows open just like this one...

Rob
-------------- next part --------------
yocto toybox patch
  yocto toybox build
CELF proposals?

rename xparsemillitime->xparsems see if there are more users?
find multiple {} and environ_bytes()? (find -exec +)

Review Hulbert DM, get reference(s), write up proposal.
  samsung, sony, who is Lipi Lee?
  JCI? (Ha!)

jci: bash 3.2.57, fdisk, resize2fs, rsyslog, screen, oprofile, parted, wget

sntp test
  server on 10.0.2.15
  TRAP reset time from 10.0.2.2 (host)

rdate.c

make test_date swiss cheese. (See also recent commit.)
  post about date testing
  date --date=1:2:3 vs date --date=98-7-6 zero or current?
toys/other/mkdosfs.c
nbd_server.c
rfc 3164: syslog (pending/syslogd.c)
  - rfc 5424 (new 3164), 5425 (tls), 5426 (udp), RELP, systemd journal,
    buffer output if receiver not ready
    https://www.rsyslog.com/doc/relp.html (and wikipedia page)

  https://en.wikipedia.org/wiki/Rsyslog
anything else xrecvwait()?
top -H not grouping threads right, thr.c
  go through top man page, any more options? (-f ?)
  htop bars, colors?
  what's eating all the CPU time? Faster?
bc.c cleanup
ps ax (vs ps -x)
patch fuzz
route.c redo rtnetlink, android commit, github pull
echo -e '\033[?7h'
./qemu-i686.sh -nic user,hostfwd=:127.0.0.1:12345-:22
iotop swapin field
netcat only one codepath (see commit, proposed commit)
dhcp/zcip = same file, automatic fallback option

yocto upstreamery:
  http://cgit.openembedded.org/meta-openembedded/tree/meta-oe/recipes-core/toybox
    Eduardas Meile <eduardas.m at fods.com>
    require toybox.inc
    SRCREV = "123456789abcdef"

buildroot toybox patch
RFC writeup for class E and multicast

grub qemu install, partition toggle (hda partition/format loopback script?)

sntp -M server
that code review

Submit ping range patch to kernel again
  https://twitter.com/b0rk/status/1094297731546386437
 * Note: ping_group_range should never have existed. To disable it, do:
 *   echo 0 $(((1<<31)-1)) > /proc/sys/net/ipv4/ping_group_range
 * (Android does this by default in its init script.)

  Submit initramfs devtmpfs mount patch to kernel again

patch fuzz factor, git rename/copy/delete support

mkfs.vfat, genvfatfs, mtools

hexdump
 * -n# -s# -C -e -f FILE -v

# mkroot build not noticing 118 commits in --version? (git describe --tags)
#   - because git wasn't in $PATH

watch -> less -> edit

Text editor:
  nano
  joe
  vi
  emacs

nut: dhcpd, tftpd


deflate, zip

rsync

screen

htop

sntp server, receiver

android/core
  toolbox: grep, getevent, r, getprop.cpp
  cpio
  logcat
  run-as - sudo
  

BroadcastSender.c BroadcastReceier.c

collate --color isatty() logic. (always/never/auto)

For release:
  test.c promote (PR 47 100 102)
    http://lists.landley.net/pipermail/toybox-landley.net/2018-September/009666.html
  watch.c promote

commit scripts/genconfig.sh (prlimit fix)

lowhang:
  sntpd
  httpd, wget
  tftp, tftpd
  route
  strace sudo flex gzip htop mtd ntp ar nfsmount smbmount
  nbd-server tcpdump gzip zip arp arping ftpput ftpd
  tar expr

  sh-history vi microemacs joe screen

  deflate RFC printout
  arp rfc printout
  dhcp printout (rfc 3927)

# follow argv[0] symlinks until one's recognized
  - todo: toybox ./name shouldn't follow symlink, only top level? Hmmm...

cleanup environ_bytes()
make xparsetime() return ms like millitime()

status.html:
uncategorized: crc32 fmt uuidgen dhcp6 ipaddr iplink iproute iprule iptunnel toysh -sh -toysh traceroute6

Add toybox to buildroot
  - with gazillion CONFIG things (search for busybox)

fix ulimit

rm infinite descent

migrate sed and patch to loopfile_replacelines?
  loopfile_lines() with -i behavior?

nfsmount:
  lkml.iu.edu/hypermail/linux/kernel/1606.1/01115.html

netcat logger
ratelimit
git clone https://lore.kernel.org/lkml/0

find x = x "may be unused" assignments:
  grep '[^a-zA-Z0-9]\(..*\) = \1[,; ]' toys/*/*.c
  -Wmaybe-uninitialized not supported by llvm

sudo netstat -ltnp

mount | column -t
  column -t -s:
multitail?

cd - # OLDPWD not set

sudo !! # run last command as root

> what i miss more is not having a good way to check for expected
> errors, especially given that we're making little/no effort to match
> error messages. would be good to have some kind of "i expect command c
> to fail with exit code x and stderr output that matches regex r"
> utility.

conference writeup: if I had a million dollars

Going deeper on that topic, here's David Wheeler's 009 dissertation:
Countering "trusting trust" through diverse double compiling.

https://dwheeler.com/trusting-trust/

That's _why_ reproducing builds from source is so important, and how it's just the _start_ of proper analysis.

Rob

P.S. If we had an unlimited budget I'd hire a couple recent graduates from a women's technical college to glue qemu's tcg to tinycc so we had a third compiler that could target superh, and then set them to making it reproduce https://bellard.org/tcc/tccboot.html with a current kernel+toybox+musl. And turn it into a multicall binary so ld/strip/nm/objdump aliases worked and it could replace binutils as well as cc...
-------------- next part --------------
Toybox!

Simple build and use
  make defconfig && make && make PREFIX=/chroot install
  ./toybox, ./toybox ls -l, ln -s toybox ls && ./ls -l
  mkdir /mybin && cp toybox /mybin &&
    for in $(bin/toybox); do ln -s /mybin/$i toybox; done
    then export $PATH
  CROSS_COMPILE=prefix LDFLAGS=--static
  make sed, make change
  make help install_flat

  ./toybox --help
    ./toybox --help command, ./toybox help command, ./toybox command --help
    help vs man (shell builtins vs system $PATH)

  Why ./toybox has no paths: shell scripts!
    for i in $(./toybox); ln -s toybox $i; done
    for i in $(./toybox --list); ln -s /bin/toybox $i; done
    for i in $(./toybox --list); cp -s toybox $i; done)

Writing a new command
  - The simplest command is "false.c", copy it to a new name
  - Looking at skeleton.c in examples

Toybox tricks

  Building it:
    configure; make; make install
      - we use "make menuconfig", defconfig is maximum sane, also "make sed"
      - see "make help"

  Modifying it:

    Adding a new command:
      Add a command file under toys/dirname, it picks it up automatically.

      sed 's/false/boom/;s/FALSE/BOOM/' toys/*/false.c > toys/pending/boom.c
      make distclean; make defconfig; make boom
      Starting from toys/examples/hello.c or skeleton.c for more plumbing.

      Adding a new category (toys/mycompany) just needs a README, first line
      used as kconfig description (rest ignored). Note: flat namespace.

    The smallest/simplest command is false.c:

      - Starts with multiline /* comment */
      - Lines with leading asterisk lines are comment, no function.
      USE_NAME(NEWTOY(cmdname, options, flags))
      kconfig entry for CMDNAME
      #include "toys.h"
      void cmdname_main(void)

      hello.c adds standards URLs and GLOBALS()
        #define FOR_which before toys.h to get FLAG_ macros and TT

    In the comment at top, conventions are:

      one line "at a glance" summary of what command's for

      Copyright (who to blame, how old is it, saves you a git log)

      Relevant standard(s)

        "Deviations" section at the end if we depart from those standards.

    kconfig

      Starts with "config" (at left edge) and continues until */ line, so
      must be at end of starting comment block.

      Same as kernel, buildroot, u-boot, etc. (Circa Linux 2.6.12 anyway.)
        "make defconfig/menuconfig" reads Config.in, produces .config".
          - see miniconfig
        Then script/*.sh reads .config to produce generated/config.h.

        Top level Config.in #includes generated/Config.in collecting each
        command's kconfig stanzas

      generated/config.h created #defining CFG_XXX and USE_XXX() macros

        #instead of #define CONFIG_BLAH we #define CFG_BLAH to 1 or 0.
        if (BLAH) becomes if (1 or 0) then function-sections+gc-sections
          USE_BLAH(optional contents)
        No #ifdefs in code! (Well ALMOST never, uname.)

      defconfig is "maximum sane config", convention is commands in toys/pending
        and toys/example should "default n"

    NEWTOY() line, defines a command

      NEWTOY(name, options, flags)

      Nope, no trailing semicolon. (It's a macro, not a function.) Must start
      at left edge (no leading space) to get picked up by build scripts.

      - name: command name. Must have a name_main() and uppercased kconfig
        "config NAME" entry.

      - options: either NULL/0 or "string" in extended optargs format.
        - command line options are their own section, we'll come back to this

      - flags: TOYFLAG macros defined/described in lib/toyflags.h

      There's a USE() macro around it to conditionally enable it. "This
      command is enabled when this config option is enabled" is implemented
      by that USE() macro.

      generated/newtoys.h, more or less grep NEWTOY toys/*/*.c | sort
        alphabetical list of commands (for binary search)
          Except that if multiplexer enabled, "toybox" entry is first to
          avoid one level of search each time you run a command.
        USE() macros around each one, so only enabled commands visible
        #included twice in main.c and once in lib/help.c
          #define NEWTOY macro differently to initialize different arrays
          - toy_list[], NEED_OPTIONS, help_data;
      
    GLOBALS()
      generated/globals.h
      structs packed into a union, less memory
      default to zero (just like normal globals, ELF spec guarantees that).
      union this, ala this.commandname.value
      TT #defined to yours (#define FOR_command #include "toys.h")
        TT.blah accesses GLOBALS() entry

      Start of GLOBALS() can be filled out by option string. Convention is
      these variables have the same name as the option, with a blank line
      between option arguments and remaining args.


      union at start when multiple commands share same TT

    option string.

      Conceptually similar to getopt() but doesn't use libc function
        greatly extended syntax.
      
      lib/args.c parses it
        - writes to toys.optflags, toys.optargs[], and start of GLOBALS()

      option string has prefix, (longopts) shortopts, types of argdata to save
      - seen flags saved in toys.optflags

      - argument data saved in GLOBALS() (right to left)
        treated as array of long[], LP64 says we can save pointers in that too
          inconvenient for 32 bit (range limited ints).
          FLOAT is sizeof(long) on this target.
        - order of structure members guaranteed by C99
        - convention is argument name same as option letter
          So "a:" means "cmd -a BLAH" assigns "BLAH" to TT.a
          int a,b,c collated on same line. (order guaranated by c99 either way)

      - leftover arguments appended to toys.optargs[]

      USE_XXX() treated specially by flag parsing logic:
        same bit positions (FORCE_FLAGS)
        same global[] slots (avoids #ifdef)

    FLAG macros
      toys.h #includes generated/flags.h created from .config by scripts/*.sh
        - bit position from toys.optflags, counting from right

      if (toys.optflags&FLAG_x) blah();

      #define FOR_command before #include <toys.h> to select flags
      Gearshift to new flag set:
      #define CLEANUP_old #define FOR_new #include "generated/flags.h"

      when USE("x") disabled, FLAG_x becomes 0, if (variable&0) constant
      propagates to 0 and the code drops out at compile time.
      BUT if that's not what you want (multiple commands in same file)

      #define FORCE_FLAGS to keep all flag macros enabled despite .config

      optflags bits in order: "abcdefgh" with -adef sets "10011100" = 0x9c
      Some code bypasses flag macros, see gzip_main() and uname_main()

      FLAG(x) becomes (toys.optflags & FLAG_x)

    populating GLOBALS:

      GLOBALS() also populated right to left (becomes top to bottom)
        array of long[], array of pointer same size, so slot[3] same either way.
        FLOAT is whatever floating point type same size as long.
        long is 32 bit on 32 bit platforms, so "truncate -s 8g" problem. :(

    other globals outside GLOBALS()

      toybuf, libbuf - scratch space

      struct toy_list toy_list[];
        - array of enabled NEWTOY() entries: name, main(), optstr, flags

    struct toy_context toys; - global variables common to most commands
      which - pointer to our toy_list entry. toys.which->name is this command.
      argv - original unfiltered command line
      optargs - command line arguments leftover after argument parsing
      optc - count of optargs
      optflags - one bit per flag seen on command line (& with FLAG macros)
      exitval - error code returned when we xexit() or return from main
        - defaults to 0, error_msg() sets it to 1 if it's still 0

    ARRAY_LEN()

    nommu support: vfork, malloc, stack

    main.c

    lib/lib.c and lib/xwrap.c

    lib/portability.[ch]

  Build plumbing:

    configure, make, install
      configure: default values for environment variables
      Config.in
      kconfig: old kernel kconfig plumbing from 2.6.12, with Makefile targets
      scripts/genconfig.sh

    Not autoconf, configure and scripts/genconfig.sh

    No makefiles, shell script.
      We more or less do "cc main.c lib/*.c toys/*/*.c $CFLAGS"
      - but generated/* created from toys/*/*.c header info

      - filter to only include enabled toys/*/*.c commands.
        (Yes we build lib/*.c wildcard.)
        - ffunction-sections -gc-sections trick
      - script that outputs that in generated/build.sh
      - figure out which shared libraries (-lm -lz etc) we need
      - generate generated/* files
        - probe toolchain features: "does it have this function/header/symbol"
        - lots of 'cc -E -dM - < /dev/null' symbols there by default

    - the generated/ stuff:

    Kconfig (old simple version)

  make install, install_flat, install_airlock (scripts/install.sh)

  test suite:
    make tests, make test_sed

special handling of USE_BLAH() macros in optstring during build
  scripts/mkflags.c

No external dependencies (optional only)
  https via pipe to external program
  No curses, ANSI escapes.
  internal deflate implementation
  internal hash functions

stacktop, vfork: !stacktop checked in multiple places

0BSD.
  Zero Clause BSD, SPDX 0BSD.
  OpenBSD suggested template license with half a sentence removed.
  Corporate friendly public domain equivalent license.
  The BSD rubber stamp (4 clause, 3 clause, 2 clause... zero clause. Derived
  from OpenBSD and asked Kirk McKusick.) Big warantee disclaimer security
  blanket for legal departments, all that pointless legal boilerplate for
  lawyers to roll around in. Entirely ablative.

  Public domain adjacent licenses. Combineable/relicensable? Well, sort of.
  Busybox ping.c: put GPL at start and BSD at end to minimize obviousness of
  conflict. Stuttering problem. How do you enforce that mess?

  But trademark? Patent? Simple: this is a copyright license. You want a
  trademark license, add one. If you want a patent license, add one. Call
  the files COPYING, TRADEMARK, PATENT. Don't try to have one license cover
  three categories of IP law. (Or trade secret, or contract.)

FAQ:

  https://yarchive.net/comp/linux/pivot_root.html
  http://lkml.iu.edu/hypermail/linux/kernel/1310.0/02823.html


More information about the Toybox mailing list