[Toybox] [landley/toybox] Help building toybox with the NDK/bionic (#43)

enh enh at google.com
Mon Dec 26 13:48:35 PST 2016


On Mon, Dec 26, 2016 at 10:31 AM, Rob Landley <rob at landley.net> wrote:
> On 12/24/2016 01:40 PM, enh wrote:
>> On Wed, Dec 21, 2016 at 2:29 PM, Rob Landley <rob at landley.net> wrote:
>>> On 12/18/2016 12:52 PM, enh wrote:
>>>> for configure-like usage you'll need to generate a standalone toolchain:
> ...                       ^
>>> I can add another compile-time probe for this, but I'm wondering if
>>> there's a way to figure it out from the #defines? You added it in commit
>>> e0dbc6beaf37:
>>>
>>> +#ifdef __ANDROID__
>>> +#include <cutils/sched_policy.h>
>>> +#else
>>> +typedef int SchedPolicy;
>>> +int get_sched_policy(int tid, SchedPolicy *policy);
>>> +const char *get_sched_policy_name(SchedPolicy policy);
>>> +#endif
>>>
>>> So presumably you understand when it is/isn't there? Anyway, when I #if
>>> 0'd that out, it got further but then died with:
>>
>> the problem here is that libcutils is part of the platform, but not
>> the NDK.
>
> Limiting what I can test by building against the NDK. :(

well, toybox is in this weird position of being both a project in its
own right but also part of the system. so it gets its own selinux
label that lets it do things a regular app couldn't (like read other
process' info in /proc, say, for ps(1) and friends), and it also gets
a dependency on libcutils for decoding the Android-specific extra
scheduler info that toybox ps can show on Android.

>> i think the right fix here is to have a probe for libcutils
>> (like there already is for libselinux)?
>
> I can do that, but this means I still don't have a "success" case to
> test against. (I can't even test that the probe works.)

i don't think the existing probes work in AOSP either; Android's build
system doesn't believe in a separate "configure" step.

> Sigh. I'm finally back in Austin and my big box has a terabyte disk in
> it, and cable modem instead of phone tethering, so I can presumably
> clean off 200 gigs from that and try to install AOSP again...
>
> (That said asking most people to do that remains a big ask.)

if you're not building your own system image, the ps you can build
would be pretty useless to you anyway, because it won't have the right
selinux label. not having the Android scheduling priority field is the
least of their worries.

>> or -- since the platform build
>> doesn't use your build system at all -- you can just change
>> __ANDROID__ here to anything you like, and i'll set it in the Android
>> build system's build system for toybox.
>
> The logical thing to do would be to change it in the _new_ thing, ala
> defining an __ANDROID_NDK__ or similar in the NDK, if that has a
> significantly different API than the existing AOSP build.

it's more the other way round though: the public API exposed in the
NDK is a subset of the full platform API.

> But a compile time probe works too, and I can just do that for now.

as long as there's something i can -D in the Android build system for
toybox to say "we can link against libcutils"...

>>> toys/android/getprop.c:20:31: fatal error: cutils/properties.h: No such
>>> file or directory
>>>  #include <cutils/properties.h>
>>>
>>> I'm guessing that was the libselinux thing you were talking about,
>>> maybe? Anyway, I can switch that app off.
>>
>> yeah, i think for now just turning off the android apps makes the most
>> sense.
>
> You are aware of the irony of the android apps being the main thing I
> _can't_ test under the android NDK, right?

yes and no. see above.

>> (these ones are slightly different to the one above in that we
>> can rewrite these to just use bionic's lower-level primitives. i'll
>> send a patch for that when we get closer to being able to build with
>> the NDK.)
>
> Yay patch. I look forward to it.

sent.

>> hah. that makes me feel more vindicated about being anal with the
>> double-underscores. i often wonder whether i'm just wasting time when
>> i clean stuff like that up. anyway, looks like i already cleaned this
>> up in the platform
>> (https://android.googlesource.com/platform/system/core/+/665051ca6347ded0a44dc6a36a2467f663c101df)
>> but that hasn't made it to the NDK yet. filed
>> https://github.com/android-ndk/ndk/issues/271.
>
> How do I get new versions to test?

we do have nightly builds:

https://plus.google.com/+ElliottHughes/posts/ixutWK8A5nz?sfc=false

and betas:

https://github.com/android-ndk/ndk/wiki

but you'll need to wait for
https://github.com/android-ndk/ndk/issues/271 to be fixed first :-)

>>> Then it made it through several commands, but the "eject" command died
>>> because:
>>> toys/other/eject.c:25:21: fatal error: scsi/sg.h: No such file or directory
>>>  #include <scsi/sg.h>
>>>
>>> Which is a linux-kernel header, but I can't say I'm hugely surprised.
>>> Switch that off...
>>
>> yeah, it's a slightly weird header in that it's probably meant to be
>> public API but it's not a uapi header. for the platform we have it
>> (and a couple of other scsi headers) as a special case, but they're
>> not copied into the NDK. filed
>> https://github.com/android-ndk/ndk/issues/269.
>
> Yeah, musl substitutes in its own too:
>
> http://git.musl-libc.org/cgit/musl/tree/include/scsi
>
> Possibly somebody should poke the kernel guys. :)

if you know who, please do. the uapi headers aren't great for what i
assume is their intended purpose. they're missing stuff that should be
exposed to userspace, still include stuff that shouldn't, don't map
well to the should-be corresponding POSIX headers, et cetera. still,
better than having to maintain all that stuff ourselves, for the most
part.

(you'd probably be horrified at how much glibc-specific crap is in
them too, which i've never really understood because glibc doesn't use
them.)

>>> And it's doing pretty well through the rest of the commands. Lots of
>>> warnings about implicit declarations (gethostid, crypt) wandering by. I
>>> wonder if I can -Werror just _that_ error? (That's going to come back
>>> and bite me at link time, I just know it...)
>>
>> -Werror=implicit-function-declaration
>
> Ah, very nice. I wonder if llvm supports that? Hmmm... seems to.
>
> This brings up another point: llvm is _not_ prefixed,  instead it uses
> --target=blah runtime output flag selector thingies. I've only ever
> really tested clang on x86 native, but I'd like to use the ndk to add
> that to my standard regression tests (at least before each release).
>
> But I don't quite understand how your standalone toolchain thing's
> automation is supposed to work:
>
>   This operation also installs two wrapper scripts, named clang and
>   clang++, under <install-dir>/bin. These scripts invoke the clang
>   binary with the correct target architecture flags. In other words,
>   they should work without any modification, and you should be able to
>   use them in your own builds by just setting the CC and CXX
>   environment variables to point to them.
>
> The problem I have cross compiling is that I need a native compiler to
> build kconfig and instlist and such with. Traditionally, CROSS_COMPILE
> is a prefix, and then the "cc" binary (which was the standard name in
> posix in the SUSv2 days, and then c99 came out and they went "clearly
> you need to switch the binary name to c99 the same way you rename the ls
> binary to show it's posix-2008 instead of posix-2001!" and nobody did
> that, and we all stuck with "cc" and waited for posix to admit it made a
> mistake. Given that the loudest member of that committee is still
> proclaiming Solaris the One True Unix, my personal strategy has been to
> wait for somebody to start a better standards body. LSB was making a
> stab at it but the Linux Foundation put a stop to that.)
>
> Anyway, the $CC variable lets you say gcc instead of cc (the FSF can't
> hear anything outside of the confines of its own ass either, but then it
> never could), but there isn't a standard/portable way I'm aware of to
> say "the cross compiler has a different $CC name than the host
> compiler". I can stick a prefix on it, but not independently rename just
> the compiler.

use clang on the host too? we don't use gcc for anything for current
devices, and moved off gcc even earlier for the host.

> The problem is there's a whole suite of tools: assembler, linker,
> objcopy, objdump, nm, and yes they all get used in various build stuff.
> From the top level linux Makefile:
>
> AS              = $(CROSS_COMPILE)as
> LD              = $(CROSS_COMPILE)ld
> CC              = $(CROSS_COMPILE)gcc
> AR              = $(CROSS_COMPILE)ar
> NM              = $(CROSS_COMPILE)nm
> STRIP           = $(CROSS_COMPILE)strip
> OBJCOPY         = $(CROSS_COMPILE)objcopy
> OBJDUMP         = $(CROSS_COMPILE)objdump
>
> You can use cc to wrap as and ld, but not ar or strip. So the prefix
> approach is sort of necessary, because using the wrong architecture's
> tools can cause subtle bugs. (For example, if you use x86 strip on a
> SuperH binary it changes the . prefixes to _ prefixes and the result
> won't run because the dynamic linker can't find the "start" symbol.
> Because the Japanese engineers who implemented SuperH elf translated the
> ELF spec into japanese and the codepage switch substituted . for _ and
> they implemented what their spec said and nobody noticed before it
> shipped. I hope that llvm's own version of nm and such can cram all this
> into one binary that autodetects the input ELF type and handles
> everything appropriately, but your toolchain is still using binutils as
> the backend which historically doesn't.)
>
> I can test on clang with an x86-64 standalone NDK toolchain because
> "CC=clang LDFLAGS=--static" should get me something I can run on an
> Ubuntu host. But testing arm clang? Design assumption is that cross
> toolchains have a unique prefix, and that's not the case here. :(
>
> (Oh, is -march=armv7-a and -mthumb doing cortex-m output? Modulo bionic
> probably not supporting a nommu target, but I can try building a static
> PIE binary. Rich is wrestling with that now over in musl-land. It would
> be really nice if either of us had more than intermittent access to a
> cortex-m board, but the smartfusion 2 I was testing on got whisked away
> the day after I found the bug...)
>
>> you can probably work around this by targeting API 21, but then you
>> really will be missing some functions used by toybox.
>
> I'll wait for a fix. Lemme know when there's a new version to try. :)

will do.

hopefully soon we'll be at a point where we can start adding building
various projects out of the box to our testing. strace and tcpdump are
more obvious candidates, but a toybox that we can take back in time
would help with NDK testing on old releases too. and i'd still like a
hermetic build by having a toybox-linked-with-host-bionic for use in
the build itself, replacing all the stuff that usually comes from the
host /bin.

>>> You can't error_exit()
>>> without verror_msg(), it's in pretty much every command. (I think if you
>>> build "false" standalone, it might get omitted... "make false", "objdump
>>> -d generated/unstripped/false | less"... Nope, it's still there.
>>> Probably shouldn't be. I'll throw it on the todo heap.) *
>>>
>>> I don't see a strong reason _not_ to have a gethostid(), but I could
>>> stub it out (or just do the syscall, or read /proc/sys/kernel/hostname)
>>> if you have one.
>>
>> have you read the man page :-)
>> http://man7.org/linux/man-pages/man3/gethostid.3.html
>
> I had it confused with gethostname.
>
> I have no idea why Sameer Pradhan's employer wanted that command, but it
> was trivial to implement, so... I can add a !TOYBOX_ON_ANDROID to the
> config?

probably someone just handed them a checklist without actually looking
through it first :-)

> The last few times it's come up for me I did some variant of:
>
>   ifconfig | sort | sed -n 's/.*HWaddr \([0-9a-zA-Z:]*\).*/\1/p;T;q'
>
> (And then ran it through sha1sum and took the first X bytes. I'd run it
> through crc32 to naturally get the requested 32 bits but I just noticed
> that ubuntu decided that should be implemented in perl, and you can't
> pipe to stdin in that version but MUST supply a filename. And - is a
> literal, because perl. Thanks ubuntu! Not that 32 bits is an interesting
> length for this sort of thing anymore anyway...)
>
> I know what to do: move it to the "examples" directory and have it
> "default n".
>
>> i haven't seen a system return anything but 0x007f0101 in decades.
>> (because who's not using DHCP?)
>>
>> and that's without getting into the privacy/security issues.
>>
>> this is the kind of thing we tend to prefer to leave broken because
>> it's a signal that you need to rewrite the calling code for it to make
>> any sense on Android. in this case, hostid(1) is useless, so i'd just
>> disable it.
>
> It fell under the "somebody wanted it, and it's technically posix,
> so..." rule. If somebody implemented "sum" and made puppy eyes at me, I
> might similarly cave. (Hey, cpio wound up having a big revival... Still
> waiting for posix to notice.)
>
> That said, this should not be in defconfig.
>
>> POSIX will catch up with reality in another 20-30 years...
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/gethostid.html
>
> Hmmm.
>
> Let's see, oldest position on the Solaris Uber Alles guy's linkedin
> resume started in 1984, assuming he was 20 at the time would make him
> ~52 now, google says German life expectancy is 81 years...
>
> I still think a new standards body is the faster path. Something that
> just documents what the system is expected to do/offer, without the
> Linux Foundation's "Red Hat is a Plutonium Sponsor so we take their side
> on rpm vs dpkg, hey why did Debian give up on us taking Ubuntu with
> them, who could have forseen this?" nonsense.
>
> (Memo: a package is not a standard. Anything there is only one
> implementation of is not yet standardized. The standard is the common
> subset between implementations, plural.)
>
>>> I can also probe/stub out getgrid_r() and crypt() which both fall under
>>> "do we have /etc/passwd or similar on this system". It would be nice if
>>> there was some sort of plan for setting up a "posix container" under
>>> android that understood 2 users (root and not root) so we can run AOSP
>>> builds as "not root", but that's a todo item. In the meantime consistent
>>> stubs would be nice. :)
>>
>> there are *many* users on an Android system, with "root" and "shell"
>> probably being the two you're looking for. we have <pwd.h> and <grp.h>
>> even if they're a little unusual (you certainly wouldn't want to loop
>> through all the users/groups on the system, for example!). getgrid_r
>> is just too new to be in any libc.a (i built a non-static toybox
>> binary just fine).
>
> Part of the reason I haven't finished cleanup of groupadd and friends is
> I dunno what that _should_ look like on android. (Not a clue. But I
> should write up a "what I'm looking for" post as a separate thing, this
> one's already too long.)
>
>> crypt(3) is another deliberate "please stop and thing about what
>> you're doing" omission.
>
> Indeed. I was trying for legacy compatibility with existing Linux
> systems. (And the $1$ and $5$ stuff isn't as bad; you can stick in an
> arbitrary algorithm there. And _no_ hash is going to survive having
> /etc/shadow leaked; the attack brute forces the password space there.)
>
> But it falls under the same "android treats users differently than the
> method Linux inherited from Bell Labs and only lightly modified".
> There's design work pending there.
>
>> Android code that wants this kind of
>> functionality should probably be using BoringSSL.
>
> Does it provide a crypt()? It can add $8$ and so on.

no.

weird that they chose arbitrary integers rather than just using the
name of the algorithm.

>   http://man7.org/linux/man-pages/man3/crypt.3.html#NOTES
>
> I know it says 'glibc', I'm trying to break Michael Kerrisk of this
> habit. Musl supports it, and uClibc used to. It's part of the Linux
> ecosystem. Putting unshare() behind #ifdef GNU_GNU_ALL_HAIL_STALLMAN
> when it's a Linux system call the hurd has never even _imagined_ is just
> cruel. And wrong.
>
> (And in that case, the man page was  wrong _first_, libc didn't require
> it. Then libc changed to conform to the man page, and I was sad.)
>
>> folks trying to
>> manually mess around with /etc/passwd or /etc/group are going to have
>> to completely rethink what they're doing. the mkpasswd toy should just
>> be disabled on Android. (build/tools/fs_config is the closest
>> equivalent.)
>
> I am often torn between "I haven't implemented this yet because I'm not
> sure what it should look like" and "users are submitting code to me that
> I haven't merged yet and I'm being a bottleneck, lemme put it in
> pending, oh people are using stuff out of pending when I dunno if what
> it implements is the best approach to take"...

i'm not saying mkpasswd and friends don't make sense for traditional
Unix systems; just that they don't make sense for Android.

> (I carve out as much time as I can, but it's never enough to keep up.
> And I've never managed to do much design work in 15 minute increments
> between higher priority interrupts. Oh well. Christmas break, time to
> shovel out the code backlog a bit...)

yeah, i haven't been able to get through as much as i'd hoped this
year either. i had hoped to switch over dd, getevent, and grep this
year (basically everything except newfs_msdos), but out of those only
had time for a few small changes to dd. on the bright side, at least
i'm ending 2016 without any open toybox bugs (counting "switch to..."
as feature requests rather than bugs)!

thanks for all your work in 2016!

> Rob



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.


More information about the Toybox mailing list