[Toybox] Bad idea regarding threading...

Rob Landley rob at landley.net
Mon Apr 30 12:02:20 PDT 2012


On 04/30/2012 02:25 AM, Elie De Brauwer wrote:
> On Sun, Apr 29, 2012 at 2:57 AM, Rob Landley <rob at landley.net> wrote:
>> It occurs to me that if I add a CONFIG_TOYBOX_THREADS with the new
>> directory traversal infrastructure, things like cp -a and rm -r could be
>> done in a multithreaded manner.
>>
>> I.E. create a thread pool equal to the number of processors, and then
>> every time you encounter a directory hand off the callbacks to a thread
>> out of the thread pool. Everything they're doing is openat() based on a
>> filehandle stored in the node structure (or a filehandle pair with the
>> second stored in the node's ->extra field), so you don't need to worry
>> about the current directory changing in another thread...
> 
> Indeed, and I think that this could also be a very nice feature to
> differentiate toybox from similar tools, to my knowledge there aren't
> any 'userlands' availble which have inherent multithreading support
> (typically because most stem from the time that it was easier to
> purchase a human kidney than to purchase smp systems.

I graduated in 1995 and went straight to work at IBM on OS/2. That was
designed with SMP in mind, and I always knew when Moore's Law started
flirting with atomic limits processors would expand laterally.

Also, half the CISC vs RISC thing was multiple execution cores. I wrote
an article about this (for a stock marking investment audience) a dozen
years ago:

  http://www.fool.com/portfolios/rulemaker/2000/rulemaker000224.htm

So they were _already_ parallelizing to soak up transistor budget back
in the 80's...

I'll stop before I go on a computer history rant. The point is "some of
us saw this coming a loooong time ago". And there was no reason having a
second processor was _that_ much crazier than having a second hard drive
or second monitor...

> The only thing I want to add there is that in such a scenario I expect
> the number of usable cores to be runtime configurable (e.g. through an
> environment variable).

I'm big into autodetecting things the user shouldn't be bothered with.

This is a constant failing of Linux.  Knoppix could boot straight to a
desktop while Linux would sit there and stop to ask you questions
halfway through the install, over and over.

I can see having a magic environment variable that lets expert users lie
to their system for some reason (although... why?), but the system
_must_ run perfectly fine without it.

I note that aboriginal linux currently does this:

# How many processors should make -j use?

MEMTOTAL="$(awk '/MemTotal:/{print $2}' /proc/meminfo)"
if [ -z "$CPUS" ]
then
  export CPUS=$(echo /sys/devices/system/cpu/cpu[0-9]* | wc -w)
  [ "$CPUS" -lt 1 ] && CPUS=1

  # If we're not using hyper-threading, and there's plenty of memory,
  # use 50% more CPUS than we actually have to keep system busy

  [ -z "$(cat /proc/cpuinfo | grep '^flags' | head -n 1 | grep -w ht)" ] &&
    [ $(($CPUS*512*1024)) -le $MEMTOTAL ] &&
      CPUS=$((($CPUS*3)/2))
fi

(Don't ask me why sysinfo doesn't tell you how many processors the
machine has...)

I'm aware that this doesn't work in containers, and you have to grovel
through /proc/cpuinfo instead. Honestly, I consider that a containers bug.

>> I also note that bzip is trivially parallelizeable (the file is handled
>> in 900k independent chunks), and that bunzip2 could be parallelized with
>> heuristics finding  block start signatures and speculatively passing
>> them off to threads (which then discard the results if they fail or the
>> previous blocks don't line up to that starting point when it gets around
>> to writing stuff out).  Commit 215 was actual a refactoring to help
>> prepare for this...
>>
>> I can do something similar with gzip based on dictionary resets,
>> although --rsyncable would help there.
>>
> 
> gzip vs pigz alike  http://zlib.net/pigz/ and more specifically
> http://zlib.net/pigz/pigz.pdf

Yeah, seemed like an obvious idea.

The "specially prepared deflate streams" thing is more or less the
--rsyncable thing I was talking about (which current gzip has). Although
I think blanking the dictionary every 128k is probably a bit too often...

>> Anyway, the _point_ of all this is if I flip the config switch to enable
>> thread support in toybox, it should _automatically_ take advantage of
>> SMP, the way mksquashfs does. I was pondering adding a new cp -F flag
>> and then went "no, that's stupid, that's like a "use the floating point
>> coprocessor" flag. If you built it with support, just DO it...)
>>
>> Anyway, just musing aloud. I'm weird in that to _me_ multithreadded
>> programming is simple because I cut my teeth on OS/2 twenty years ago,
>> but I suspect I should finish the nonthreaded 1.0 version first before
>> worrying about that...
>
> The only addition I'd like to make is that if this a path we want to
> follow I'd not wait too long with doing it. Because the more code
> there is, the more difficulty we can expect in making it all
> threadsafe (probably implying that we have some more mature tests in
> place than we have now).

Nah, I don't have to. I only have to make individual _commands_
threadsafe. Wouldn't bother with it otherwise. :)

(And note that a lot of stuff can be done with fork() instead of with
threads, if that makes it easier for people to wrap their heads around.
The "thread pool" can just have a pipe to/from each child process, send
it data and read back the result. Modulo avoiding the pipes full in each
direction deadlock, which means you need poll/select, but the netcat I
wrote's been doing that for ages and I've been meaning to genericize it
anyway...

Copying page sized chunks of data between processes isn't a huge deal
since it stays cache local, and private shared memory exists for a
reason and are trivial to do:

#include <stdio.h>
#include <sys/mman.h>

int main(int argc, char *argv[])
{
  char *c = mmap(0, 4096, PROT_READ|PROT_WRITE,
MAP_SHARED|MAP_ANONYMOUS, -1, 0);

  if (fork()) {
    sleep(1);
    printf(c);
  } else sprintf(c, "I am the walrus.\n");
}


Threads just default to sharing the entire heap, and then had to invent
thread-local-storage to get some private globals back. (Basically sun
invented threading because both its fork() and its scheduler _sucked_.
Then they forced everybody to _use_ it via Java, which provides no
alternative mechanisms to do lots of things the way Unix has for decades.)

Unix starts with everything private and you request shared memory when
you need it, and you can use small pipe read/writes between children as
coordination mechanisms without even linking against pthread if you
don't want to...

Heck, before sysv shared memory was invented (and before Linux grew
support for MAP_SHARED|MAP_ANONYMOUS), people used to create a file,
mmap it from multiple processes, and then delete it as a signal to the
OS that the disk didn't need to be updated anymore. Alas, Linux took
that hack out years ago when tmpfs was invented, which caught User Mode
Linux by surprise:

  http://copilotco.com/mail-archives/uml.2005/msg04781.html

> my 2 cents

Rob
-- 
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation.  Pick one.

 1335812540.0


More information about the Toybox mailing list