[Toybox] More expand cleanups

Felix Janda felix.janda at posteo.de
Sat Dec 1 01:02:01 PST 2012


On 12/01/12 at 12:42am, Rob Landley wrote:
> On 11/30/2012 12:56:06 PM, Felix Janda wrote:
> > On 11/30/12 at 02:51am, Rob Landley wrote:
> > > I updated wc to theoretically deal with buffer wraps better. In  
> > reality
> > > I haven't got UTF8 test data to run through this, and should  
> > probably
> > > find some at some point.
> > 
> > Didn't the original version already deal well with it. (That's what  
> > the
> > strange tests are about.) Notice that the C library keeps an internal
> > state which is updated each time mbrtowc is called. If there's only  
> > part
> > of a character on the end mbrtowc will return -2, but remember the  
> > part.
> > To continue you just have to read a new toybuf and point mbrtowc to  
> > the
> > _new_ data. (The "r" in mbrtowc stands for "restartable".)
> 
> Huh, I didn't know that was what it was doing. Let me go back to your  
> code then.
> 
> > Do the tests still work?
> 
> Yes, but nothing in the test suite tests a buffer wrap. Still, if libc  
> already handles this I'm happy to let it do so.

The "$(seq 1 8192)"s are exactly in order to produce something bigger than
sizeof(toybuf). The first test uses a string with the first byte a normal
char and then 8192 double byte chars. By some luck the test still produces
the right results with your version. If you build a debug statement to detect
invalid chars into wc, you will see that "mbrtowc" encounters an invalid byte
sequence although the input is valid.

(As you have written in your commit message invalid byte sequences are
ignored in both versions.)

> > > This also assumes that all characters are the same width, which is
> > > probably wrong and I need help with if so. (I dunno how to do
> > > fontmetrics here?)
> > 
> > I think that this depends on the terminal emulator. Look for example  
> > at the
> > "-cjk_width" option of xterm.
> 
> I mean "can UTF-8 produce characters that are different sizes in the  
> terminal"? If the answer is no then i don't have to worry. If it can  
> produce double-wide characters or similar, I'd have to deal with that.

Yes, look at that option of xterm. Depending on whether it is set CJK
characters take 1 or 2 columns in xterm.

I wouldn't try to detect this...

> I know the terminal can run right to left, but that's symmetrical and I  
> don't have to care from a programming standpoint, I think...
> 
> > Felix
> 
> Rob

 1354352521.0


More information about the Toybox mailing list