[Toybox] [PATCH] nuke wcrtomb()

Oliver Webb aquahobbyist at proton.me
Thu Apr 11 08:46:51 PDT 2024


On Thu, Apr 11, 2024 at 03:37, Jarno Mäkipää <jmakip87 at gmail.com> wrote:

> there is slight difference between wctoutf8 and wcrtomb, wcrtomb
> returns -1 if its presented with non valid char, of its char is not
> presentable on current locale. I think wctoutf8 only returns positive
> integers.

wctouf8 cannot fail because it writes invalid Unicode code points as utf8.

This is another reason I asked if we could delegate the job of "Is this a valid Unicode code point" to the other Unicode code. We are not reading Unicode with utf8towc, we are reading utf8, if unicode ever gets replaced, it’s not hard to imagine that new/different encoding system representing itself with utf8 (a very elegant, efficient way to represent this type of stuff). As long as there isn’t a security problem to it, it only makes the code less agnostic where it doesn’t really need to be.

I remember from testing if you pass in max unsigned int to wctoutf8, it will write one 0xff character, which is actual invalid utf8 (the theoretical max codepoint in utf8 is 2^31-1). This is a situation where bounds checking seems sane, maybe a "if (wc > 1<<31-1) return -1" at the start of wctoutf8 would fix it?

- Oliver Webb <aquahobbyist at proton.me>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20240411/cc5b1496/attachment.htm>


More information about the Toybox mailing list