[Toybox] [PATCH] vi: changes to buffer drawing

Thu Sep 19 12:52:11 PDT 2019

Actually I think that current crunch_str prints trailing zero width
combining chars just fine?

since when width==columns its still >= 0

.................................................
for (end = start = *str; *end; columns += col, end += bytes) {
    wchar_t wc;

    if ((bytes = utf8towc(&wc, end, 4))>0 && (col = wcwidth(wc))>=0) {
      if (!escmore || wc>255 || !strchr(escmore, wc)) {
        if (width-columns<col) break;
<------col is 0 when U-0x300-0x36f
        if (out) fwrite(end, bytes, 1, out);

        continue;
      }
    }
......................

And yeah UTF-8 is good because it was originally written on napkin at
dinner table
by Ken Thompson and Rob Pike. Unicode on the other hand... not written
in napkin.

-Jarno

On Thu, Sep 19, 2019 at 7:39 PM Jarno Mäkipää <jmakip87 at gmail.com> wrote:
>
> Yeah combining chars follow up the main glyph.  My draw_str_until had
> extra for loop to just to check if there is 0 width chars after we are
> at correct width and I only pushed data to stdout when I was sure
> about length.
>
> But interface in crunch_str is better, since it has support for
> rendering special chars with custom function.
>
> Now there is bit incorrect rendering when stepping around
> tests/files/test1.txt so I need to patch this up. Perhaps I try to
> make crunch_nstr() work correctly...
>
> -Jarno
>
> On Thu, Sep 19, 2019 at 5:34 PM Rob Landley <rob at landley.net> wrote:
> >
> > On 9/15/19 8:05 AM, Jarno Mäkipää wrote:
> > > Replaced: draw_str_until with lib/crunch_str() where possible
> > >
> > > Removed: Unused char draw functions.
> > >
> > > Implemented: crunch_nstr() which is crunch_str with additional check
> > > for byte length, this can be used to draw substrings or non null
> > > terminated strings. (This can be moved to lib/ if its useful for others)
> >
> > Applied, but I note when I wrote crunch_str() I assumed that unicode was sane,
> > which was wrong.
> >
> > UTF-8 is very well done. Unicode combining characters are as stupid as it's
> > possible to be: they TRAIL the printing character, meaning that you have a base
> > character that gets displayed, and then you redraw over it repeatedly as you get
> > each new modifier attaching to the _previous_ character you already drew, and
> > then you can't tell you've gone past your length allocation until you parse the
> > first character you _can't_ display in that space, which you then need to unget.
> >
> > I thought combining characters were stored up and then applied to the _next_
> > character (which would have been the sane thing to do), and the measuring logic
> > works based on that assumption. So it probably won't display combining
> > characters on the last UTF8 character because the unicode committe is too dumb
> > to live.
> >
> > Rob