[Toybox] [landley/toybox] Help building toybox with the NDK/bionic (#43)

Rob Landley rob at landley.net
Mon Dec 26 10:31:45 PST 2016


On 12/24/2016 01:40 PM, enh wrote:
> On Wed, Dec 21, 2016 at 2:29 PM, Rob Landley <rob at landley.net> wrote:
>> On 12/18/2016 12:52 PM, enh wrote:
>>> for configure-like usage you'll need to generate a standalone toolchain:
...                       ^
>> I can add another compile-time probe for this, but I'm wondering if
>> there's a way to figure it out from the #defines? You added it in commit
>> e0dbc6beaf37:
>>
>> +#ifdef __ANDROID__
>> +#include <cutils/sched_policy.h>
>> +#else
>> +typedef int SchedPolicy;
>> +int get_sched_policy(int tid, SchedPolicy *policy);
>> +const char *get_sched_policy_name(SchedPolicy policy);
>> +#endif
>>
>> So presumably you understand when it is/isn't there? Anyway, when I #if
>> 0'd that out, it got further but then died with:
> 
> the problem here is that libcutils is part of the platform, but not
> the NDK.

Limiting what I can test by building against the NDK. :(

> i think the right fix here is to have a probe for libcutils
> (like there already is for libselinux)?

I can do that, but this means I still don't have a "success" case to
test against. (I can't even test that the probe works.)

Sigh. I'm finally back in Austin and my big box has a terabyte disk in
it, and cable modem instead of phone tethering, so I can presumably
clean off 200 gigs from that and try to install AOSP again...

(That said asking most people to do that remains a big ask.)

> or -- since the platform build
> doesn't use your build system at all -- you can just change
> __ANDROID__ here to anything you like, and i'll set it in the Android
> build system's build system for toybox.

The logical thing to do would be to change it in the _new_ thing, ala
defining an __ANDROID_NDK__ or similar in the NDK, if that has a
significantly different API than the existing AOSP build.

But a compile time probe works too, and I can just do that for now.

>> toys/android/getprop.c:20:31: fatal error: cutils/properties.h: No such
>> file or directory
>>  #include <cutils/properties.h>
>>
>> I'm guessing that was the libselinux thing you were talking about,
>> maybe? Anyway, I can switch that app off.
> 
> yeah, i think for now just turning off the android apps makes the most
> sense.

You are aware of the irony of the android apps being the main thing I
_can't_ test under the android NDK, right?

> (these ones are slightly different to the one above in that we
> can rewrite these to just use bionic's lower-level primitives. i'll
> send a patch for that when we get closer to being able to build with
> the NDK.)

Yay patch. I look forward to it.

> hah. that makes me feel more vindicated about being anal with the
> double-underscores. i often wonder whether i'm just wasting time when
> i clean stuff like that up. anyway, looks like i already cleaned this
> up in the platform
> (https://android.googlesource.com/platform/system/core/+/665051ca6347ded0a44dc6a36a2467f663c101df)
> but that hasn't made it to the NDK yet. filed
> https://github.com/android-ndk/ndk/issues/271.

How do I get new versions to test?

>> Then it made it through several commands, but the "eject" command died
>> because:
>> toys/other/eject.c:25:21: fatal error: scsi/sg.h: No such file or directory
>>  #include <scsi/sg.h>
>>
>> Which is a linux-kernel header, but I can't say I'm hugely surprised.
>> Switch that off...
> 
> yeah, it's a slightly weird header in that it's probably meant to be
> public API but it's not a uapi header. for the platform we have it
> (and a couple of other scsi headers) as a special case, but they're
> not copied into the NDK. filed
> https://github.com/android-ndk/ndk/issues/269.

Yeah, musl substitutes in its own too:

http://git.musl-libc.org/cgit/musl/tree/include/scsi

Possibly somebody should poke the kernel guys. :)

>> And it's doing pretty well through the rest of the commands. Lots of
>> warnings about implicit declarations (gethostid, crypt) wandering by. I
>> wonder if I can -Werror just _that_ error? (That's going to come back
>> and bite me at link time, I just know it...)
> 
> -Werror=implicit-function-declaration

Ah, very nice. I wonder if llvm supports that? Hmmm... seems to.

This brings up another point: llvm is _not_ prefixed,  instead it uses
--target=blah runtime output flag selector thingies. I've only ever
really tested clang on x86 native, but I'd like to use the ndk to add
that to my standard regression tests (at least before each release).

But I don't quite understand how your standalone toolchain thing's
automation is supposed to work:

  This operation also installs two wrapper scripts, named clang and
  clang++, under <install-dir>/bin. These scripts invoke the clang
  binary with the correct target architecture flags. In other words,
  they should work without any modification, and you should be able to
  use them in your own builds by just setting the CC and CXX
  environment variables to point to them.

The problem I have cross compiling is that I need a native compiler to
build kconfig and instlist and such with. Traditionally, CROSS_COMPILE
is a prefix, and then the "cc" binary (which was the standard name in
posix in the SUSv2 days, and then c99 came out and they went "clearly
you need to switch the binary name to c99 the same way you rename the ls
binary to show it's posix-2008 instead of posix-2001!" and nobody did
that, and we all stuck with "cc" and waited for posix to admit it made a
mistake. Given that the loudest member of that committee is still
proclaiming Solaris the One True Unix, my personal strategy has been to
wait for somebody to start a better standards body. LSB was making a
stab at it but the Linux Foundation put a stop to that.)

Anyway, the $CC variable lets you say gcc instead of cc (the FSF can't
hear anything outside of the confines of its own ass either, but then it
never could), but there isn't a standard/portable way I'm aware of to
say "the cross compiler has a different $CC name than the host
compiler". I can stick a prefix on it, but not independently rename just
the compiler.

The problem is there's a whole suite of tools: assembler, linker,
objcopy, objdump, nm, and yes they all get used in various build stuff.
>From the top level linux Makefile:

AS		= $(CROSS_COMPILE)as
LD		= $(CROSS_COMPILE)ld
CC		= $(CROSS_COMPILE)gcc
AR		= $(CROSS_COMPILE)ar
NM		= $(CROSS_COMPILE)nm
STRIP		= $(CROSS_COMPILE)strip
OBJCOPY		= $(CROSS_COMPILE)objcopy
OBJDUMP		= $(CROSS_COMPILE)objdump

You can use cc to wrap as and ld, but not ar or strip. So the prefix
approach is sort of necessary, because using the wrong architecture's
tools can cause subtle bugs. (For example, if you use x86 strip on a
SuperH binary it changes the . prefixes to _ prefixes and the result
won't run because the dynamic linker can't find the "start" symbol.
Because the Japanese engineers who implemented SuperH elf translated the
ELF spec into japanese and the codepage switch substituted . for _ and
they implemented what their spec said and nobody noticed before it
shipped. I hope that llvm's own version of nm and such can cram all this
into one binary that autodetects the input ELF type and handles
everything appropriately, but your toolchain is still using binutils as
the backend which historically doesn't.)

I can test on clang with an x86-64 standalone NDK toolchain because
"CC=clang LDFLAGS=--static" should get me something I can run on an
Ubuntu host. But testing arm clang? Design assumption is that cross
toolchains have a unique prefix, and that's not the case here. :(

(Oh, is -march=armv7-a and -mthumb doing cortex-m output? Modulo bionic
probably not supporting a nommu target, but I can try building a static
PIE binary. Rich is wrestling with that now over in musl-land. It would
be really nice if either of us had more than intermittent access to a
cortex-m board, but the smartfusion 2 I was testing on got whisked away
the day after I found the bug...)

> you can probably work around this by targeting API 21, but then you
> really will be missing some functions used by toybox.

I'll wait for a fix. Lemme know when there's a new version to try. :)

>> You can't error_exit()
>> without verror_msg(), it's in pretty much every command. (I think if you
>> build "false" standalone, it might get omitted... "make false", "objdump
>> -d generated/unstripped/false | less"... Nope, it's still there.
>> Probably shouldn't be. I'll throw it on the todo heap.) *
>>
>> I don't see a strong reason _not_ to have a gethostid(), but I could
>> stub it out (or just do the syscall, or read /proc/sys/kernel/hostname)
>> if you have one.
> 
> have you read the man page :-)
> http://man7.org/linux/man-pages/man3/gethostid.3.html

I had it confused with gethostname.

I have no idea why Sameer Pradhan's employer wanted that command, but it
was trivial to implement, so... I can add a !TOYBOX_ON_ANDROID to the
config?

The last few times it's come up for me I did some variant of:

  ifconfig | sort | sed -n 's/.*HWaddr \([0-9a-zA-Z:]*\).*/\1/p;T;q'

(And then ran it through sha1sum and took the first X bytes. I'd run it
through crc32 to naturally get the requested 32 bits but I just noticed
that ubuntu decided that should be implemented in perl, and you can't
pipe to stdin in that version but MUST supply a filename. And - is a
literal, because perl. Thanks ubuntu! Not that 32 bits is an interesting
length for this sort of thing anymore anyway...)

I know what to do: move it to the "examples" directory and have it
"default n".

> i haven't seen a system return anything but 0x007f0101 in decades.
> (because who's not using DHCP?)
> 
> and that's without getting into the privacy/security issues.
> 
> this is the kind of thing we tend to prefer to leave broken because
> it's a signal that you need to rewrite the calling code for it to make
> any sense on Android. in this case, hostid(1) is useless, so i'd just
> disable it.

It fell under the "somebody wanted it, and it's technically posix,
so..." rule. If somebody implemented "sum" and made puppy eyes at me, I
might similarly cave. (Hey, cpio wound up having a big revival... Still
waiting for posix to notice.)

That said, this should not be in defconfig.

> POSIX will catch up with reality in another 20-30 years...

http://pubs.opengroup.org/onlinepubs/9699919799/functions/gethostid.html

Hmmm.

Let's see, oldest position on the Solaris Uber Alles guy's linkedin
resume started in 1984, assuming he was 20 at the time would make him
~52 now, google says German life expectancy is 81 years...

I still think a new standards body is the faster path. Something that
just documents what the system is expected to do/offer, without the
Linux Foundation's "Red Hat is a Plutonium Sponsor so we take their side
on rpm vs dpkg, hey why did Debian give up on us taking Ubuntu with
them, who could have forseen this?" nonsense.

(Memo: a package is not a standard. Anything there is only one
implementation of is not yet standardized. The standard is the common
subset between implementations, plural.)

>> I can also probe/stub out getgrid_r() and crypt() which both fall under
>> "do we have /etc/passwd or similar on this system". It would be nice if
>> there was some sort of plan for setting up a "posix container" under
>> android that understood 2 users (root and not root) so we can run AOSP
>> builds as "not root", but that's a todo item. In the meantime consistent
>> stubs would be nice. :)
> 
> there are *many* users on an Android system, with "root" and "shell"
> probably being the two you're looking for. we have <pwd.h> and <grp.h>
> even if they're a little unusual (you certainly wouldn't want to loop
> through all the users/groups on the system, for example!). getgrid_r
> is just too new to be in any libc.a (i built a non-static toybox
> binary just fine).

Part of the reason I haven't finished cleanup of groupadd and friends is
I dunno what that _should_ look like on android. (Not a clue. But I
should write up a "what I'm looking for" post as a separate thing, this
one's already too long.)

> crypt(3) is another deliberate "please stop and thing about what
> you're doing" omission.

Indeed. I was trying for legacy compatibility with existing Linux
systems. (And the $1$ and $5$ stuff isn't as bad; you can stick in an
arbitrary algorithm there. And _no_ hash is going to survive having
/etc/shadow leaked; the attack brute forces the password space there.)

But it falls under the same "android treats users differently than the
method Linux inherited from Bell Labs and only lightly modified".
There's design work pending there.

> Android code that wants this kind of
> functionality should probably be using BoringSSL.

Does it provide a crypt()? It can add $8$ and so on.

  http://man7.org/linux/man-pages/man3/crypt.3.html#NOTES

I know it says 'glibc', I'm trying to break Michael Kerrisk of this
habit. Musl supports it, and uClibc used to. It's part of the Linux
ecosystem. Putting unshare() behind #ifdef GNU_GNU_ALL_HAIL_STALLMAN
when it's a Linux system call the hurd has never even _imagined_ is just
cruel. And wrong.

(And in that case, the man page was  wrong _first_, libc didn't require
it. Then libc changed to conform to the man page, and I was sad.)

> folks trying to
> manually mess around with /etc/passwd or /etc/group are going to have
> to completely rethink what they're doing. the mkpasswd toy should just
> be disabled on Android. (build/tools/fs_config is the closest
> equivalent.)

I am often torn between "I haven't implemented this yet because I'm not
sure what it should look like" and "users are submitting code to me that
I haven't merged yet and I'm being a bottleneck, lemme put it in
pending, oh people are using stuff out of pending when I dunno if what
it implements is the best approach to take"...

(I carve out as much time as I can, but it's never enough to keep up.
And I've never managed to do much design work in 15 minute increments
between higher priority interrupts. Oh well. Christmas break, time to
shovel out the code backlog a bit...)

Rob


More information about the Toybox mailing list