[Toybox] Working on a [currently lame] downloader

Isaac Dunham ibid.ag at gmail.com
Sun Jul 12 23:18:47 PDT 2015


Hello,
I've been working on an HTTP(S) downloader (what wget does, but currently
completely incompatible with everything) for toybox.
Currently it works to some degree, so I thought I'd mention that it's
in progress, ask for a general idea of what's desired, and give people
an idea of how completely lame it is right now and how I'm doing it.

I presume that the agenda for toybox is implementing some subset of wget
in a compatible manner; is "what busybox wget supports + SSL" a rough
approximation of the desired functionality?


I mentioned that it's HTTP(S); it fetches files over SSL without
implementing SSL. I cheated on networking: it calls netcat or 
openssl s_client -quiet -connect.
It uses an approach roughly similar to xpopen_both(), except that
it uses socketpair() instead of pipe(); it should be possible to switch
to xpopen_both(), which would probably fix a few of the bugs.
(I'd not realized that there was an xpopen_both until just now.)
This strategy is probably the main part that will actually be useful.

Now, the lame part (ie, everything else).
The working name is ncdl (because it's a downloader that uses netcat,
of course); sample usage is
 ncdl -u example.com:443/index.html -s -o index.html
You can probably see some oddities:
- currently, it assumes that the underlying protocol is HTTP, and does
  not accept proper http:// or https:// urls
- it has no idea what default ports are (so you need to specify even
  port 80 for HTTP or port 443 for HTTPS)
- since it doesn't parse a url scheme, it uses -s to decide whether
  to use SSL
- the URL is passed via -u, rather than as an argument
- -o is used to select file output, as in curl
But the implementation is at least as lame:
- it doesn't check the status of the network client, just whether it
  could write to the socket/pipe connected to it
- it uses an HTTP/1.0 request, and doesn't bother checking
  Content-Length: it's intended to just read till there's no more data.
  In reality, it doesn't work that well. It consistently keeps trying to
  read after we run out of data and needs to be manually interrupted.
- The extent of header checking is "did we get HTTP/1.* 200 OK?".
  Then it discards everything till it gets a "blank line" ("\r\n\r\n",
  since like most network protocols HTTP needs both \r and \n).
- That means that it doesn't support redirects yet.

And I'm sure there are many more bugs besides.

Thanks,
Isaac Dunham

 1436768327.0


More information about the Toybox mailing list