[Toybox] utf8towc(), stop being defective on null bytes

Rob Landley rob at landley.net
Mon Apr 8 09:57:42 PDT 2024


On 4/8/24 11:01, enh wrote:
>> > Returning length 0 means we hit a null terminator,
>>
>> Null bytes aren't always "terminators". You can embed null bytes into data and still
>> want to do utf8 processing with it.
> 
> that's questionable ... the desire to have ASCII NUL in utf-8
> sequences (without breaking the "utf-8 sequences are usable as c
> strings" property) is the main reason for the existence of "modified
> utf-8".

You don't need a conversion function to grab a nul byte, you can check if it's a
null byte.

That value _is_ a special case, the enclosing loop can deal with it easily
enough (there's nothing to convert, it's a NUL byte, check directly). I've got
functions like regexec0() that work over a range instead of using a NUL, and
those have to deal with libc's regex stopping at NUL so the enclosing loop
advances past it and restarts.

Rob


More information about the Toybox mailing list