<div dir="ltr">seems reasonable to me, though i'd be tempted to specifically mention utf8 --- there's nothing inherently "invalid" about surrogate pairs, other than that you shouldn't see them in utf8. (though you will see them in _modified_ utf8, so Java programmers might still meet them.)<div><br></div><div>> 0x10ffff seems legitimately just plain "invalid" though.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, May 15, 2021 at 10:21 AM Rob Landley <<a href="mailto:rob@landley.net">rob@landley.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Elliott, is it worth testing for invalid unicode range in the display, ala:<br>

<br>

--- a/toys/other/ascii.c<br>

+++ b/toys/other/ascii.c<br>

@@ -44,7 +44,8 @@ static void codepoint(unsigned wc)<br>

   char *s = toybuf + sprintf(toybuf, "U+%04X : ", wc), *ss;<br>

   unsigned n, i;<br>

<br>

-  if (wc>31 && wc!=127) {<br>

+  if ((wc>0xd7ff && wc<0xe000) || wc>0x10ffff) s += sprintf(s, "invalid");<br>

+  else if (wc>31 && wc!=127) {<br>

     s += n = wctoutf8(ss = s, wc);<br>

     if (n>1) for (i = 0; i<n; i++) s += sprintf(s, " : %#02x"+2*!!i, *ss++);<br>

   } else s = memcpy(s, (wc==127) ? "DEL" : low+wc*3, 3)+3;<br>

<br>

<br>

Rob<br>

</blockquote></div>