[Toybox] [landley/toybox] Simple implementation of less (#39)

Rob Landley rob at landley.net
Tue Jul 26 13:57:40 PDT 2016


On 07/24/2016 06:51 PM, Toni Spets wrote:
> I don't expect this to be merged as-is but some feedback regarding the
> overall design would be nice. Tried to simplify it as much as possible
> while still keeping it performant and responsive over slow TTY.

Hmmm... I wasn't hugely thinking about slow tty. When curses was
invented "slow" was 300 baud, these days "slow" is a 9600bps serial
connection, I.E. 32 times faster. And most serial connections are 115200
or higher.

I was interested in redrawing on a per-line basis because I can scroll
the display up and scroll it down, but haven't found a sequence to
"insert blank line at cursor" or "delete blank line at cursor", which is
sad. (I can blank from cursor to beginning of screen or blank to end,
but I could already blank from cursor to end of line. That doesn't
selectively scroll part of the screen, which is what I'd need to
usefully move text to insert or delete in the middle, which is common
for text editing.)

The big thing I've been thinking about is utf8 fontmetrics (there are
some combining character stress tests in tests/files/utf8/test2.txt),
the unicode right-to-left display mode, parsing ANSI escape sequences
that change color or move the cursor.and if the output does that what
what does pausing look like? (The old less has "-r" and "-R", I suspect
the behavior we want is -R, although whether it should be on by default
is an open question. After all, output is a tty...)

Another test is tests/files/utf8/bad.txt which has three different types
of failures: unprintable "low ascii", invalid UTF8 sequence, and a
unicode point that doesn't map to anything. In vi, these get escaped
differently. I'm not sure what less should display.

Another fun vi corner case is what you do with characters that clip the
right edge of the screen. If you have a multicolumn unicode character
that's two columns wide but printsone column from the edge of the
screen, what happens? (Does it scroll?) My approach was just NOT
printing that and treating it as clipped (or wrapped) because I can't
display just part of it. However, if I'm expanding bad unicode into a
multicharacter escape sequence, I _can_ display part of that. (If I
recall, vi's behavior is inconsistent: it clips/wraps unicode but
displays partial sequences it expanded itself.)

I looked at trying to make less and more share code, but what they do is
just too different. (More lets all sorts of raw stuff through.)

I was also hoping to share this code with a "screen" implementation...

> As far as I'm concerned, UTF-8 support needs a "draw" function that
> outputs to given buffer

How big's the buffer? UTF8 combining characters can stack many levels
deep, so one column can be dozens of bytes of string data to render it.

> instead of stream to get the screen updates working properly.

lib/linestack.c has crunch_str() which can displays a measured amount of
data with a callback to tell it how to do the escaping (although the
checks about what needs to _be_ escaped aren't in the callback and
that's a pending design todo).

That's used by draw_str(start, width), and by utf8len(str), and by
utf8skip(str, width) which gives you the number of bytes that would be
used by (up to) width many columns (it can't return more than there are
in the string, of course).

I haven't been making it copy data from one buffer into another buffer
because it would basically be memcpy(dest, src, utf8skip(str, width)),
I.E. you've already _got_ it in a string in order to feed it to that
function.

As for having a curses-style screen storage buffer at all, if I can tell
vi "start showing line 37,623" it has to go through a giant text buffer,
find line 37,623, and display the following data. If we're wordwrapping,
what it displays it may all be a chunk of one line, and may not start at
the beginning of that line.

It has to do this in a reasonable amount of time, and a reasonable
amount of memory. I was thinking that each line of input could have
associated metadata (such as how many wide characters long it is to
display), but that means that a megabyte of newlines would take insane
amounts of memory to record, so I'd probably want some sort of grouping
(Maybe 4k blocks rounded up to next end of line?)

Anyway, this is what I was vaguely pondering while it was on my todo list.

Rob



More information about the Toybox mailing list