[Toybox] utf8towc(), stop being defective on null bytes

Oliver Webb aquahobbyist at proton.me
Mon Apr 8 09:53:03 PDT 2024


>> Null bytes aren't always "terminators". You can embed null bytes into data and still
>> want to do utf8 processing with it.
>
> that's questionable ... the desire to have ASCII NUL in utf-8
> sequences (without breaking the "utf-8 sequences are usable as c
> strings" property) is the main reason for the existence of "modified
> utf-8".

Admittedly, that’s the first time I’ve heard of "modified utf-8". There seems to be different flavors for every language (the Java one seems to be the most prominent) which means not everyone is gonna use it. Because there is no standard

Still, U+0000 is a valid code point, and having a special case especially for it that isn’t mentioned but you have to watch out for is either a bug or a documentation error.

— Oliver Webb <aquahobbyist at proton.me>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20240408/867feb22/attachment.htm>


More information about the Toybox mailing list