[Toybox] [landley/toybox] Simple implementation of less (#39)

Wed Jul 27 15:39:25 PDT 2016

On 07/26/2016 11:42 PM, Toni Spets wrote:
> On Wed, Jul 27, 2016 at 12:47 AM, Rob Landley <rob at landley.net
> <mailto:rob at landley.net>> wrote:
> 
>     On 07/26/2016 02:22 AM, Toni Spets wrote:
>     > @hifi <https://github.com/hifi> pushed 1 commit.
>     >
>     >   * 79ca172 <https://github.com/landley/toybox/commit/79ca172> Less
>     >     magic, more printf
> 
>     It is so weird that github shows that commit as living in my tree
>     instead of yours. I'm glad I only use that site for distribution of
>     masters I keep elsewhere. (Is every commit in the whole of github in a
>     single giant tree with a flat sha1hash collision namespace? I wonder how
>     git catches that, or if the don't care until
>     http://valerieaurora.org/hash.html goes red?)
> 
> Well, that's odd.

*shrug* Github.

>     Ok, looking at git diff 7918d9ff8..79ca172eb0d0
> 
>     The patch adds scr_init() allocating a screen buffer with 4 bytes per
>     character... not an array of wide characters, but a char * with a width
>     you multiply by 4. Ok? I have no idea how you're going to fit combining
>     characters into there. You also don't appear to be storing color
>     information, which is something less -R outputs...
> 
> 
> So the first reasoning was that the longest UTF-8 sequence that
> apparently is possible is 4 bytes so I threw a dice that at the worst
> possible case you draw 4 byte sequences all over and it would fit in the
> buffer. This didn't take into account TTY escapes yet which are in the
> same buffer as well.

Nope, combining characters stack arbitrarily deeply. My stress test for
this (tests/files/utf8/test1.txt) is a _silly_ test (cut and pasted from
twitter), but there are real world examples out there. Certainly more
than 4 bytes (since utf8 expansion can go to 4 bytes for one unicode
point anyway, and then unicode points are what combine.)

Combining characters show up as zero length to wcwidth().

>     (So when you scroll the screen back up, you're not just outputting ESC[T
>     and writing a new line of data at the top, but instead doing a memmove
>     on your internal screen buffer? Why?)
> 
> Simplification of the implementation. The screen is double buffered and
> the app/toy can update all or only some parts of the backbuffer and
> doesn't need to care how the update is sent to TTY.

Adding a buffering layer isn't really simplifying. It strikes me as
premature optimization.

When you resize a terminal window you get a SIGWINCH which you need to
respond to by redrawing the screen, which means regenerating the data
with a new screen width.

If your app doesn't know how wide and tall the screen is it doesn't know
when to stop generating data, even if that data is immediately discarded
it's still extra work.

>     Next function: the patch adds scr_linediff()... which has no comment
>     explaining what it does. It's doing a memcmp on an an array of char
>     pointers... scr_init() allocated exacty 2 char pointers, where does this
>     array with a length come from...
> 
>     Ok, let's jump down to the last function added to this file and see if
>     it calls the earlier ones... scr_printf() is printing into SCR_PTR().
>     What's SCR_PTR? It's added to lib.h as
> 
>       #define SCR_PTR(scr) (scr->buf[0])
> 
>     And this is the only user of that macro anywhere in the code so why is
>     it a macro?
>  
> The idea was that a toy can access the current backbuffer with SCR_PTR
> if it wants to draw things directly instead of scr_printf(). Accessing
> it as scr->buf[0] is an ugly implementation details so I hid it.

Is it abstracted out so the command doesn't have to deal with it, or is
it something the command directly deals with?

Current is always 0?

Toybox usually gets small and simple code via tight integration, taking
advantage of all the implementation details being available locally. If
we change something in lib we can change all the users of it at once.

>     If you try to write to >x or >y it returns with no error. Otherwise,
>     this function is a wrapper around:
> 
>       vsnprintf(SCR_PTR(scr) + (scr->l * y), scr->l - x, fmt, va);
> 
>     That finds the start of the y line (scr->l is scr->w*4 and I don't know
>     why you needed to save that to avoid a <<2 in the users), does _not_
>     advance x bytes (or bytes*4) into it, but truncates what we're printing
>     to what presumably _would_ be the end of the line if we had advanced.
> 
>     Again, our screen width is *4 but we're not printing wide characters
>     here. (Did you mean vswprintf?)
> 
> It's still incomplete but yes, the only point of it is to safely wrap
> drawing lines on the screen so that it will truncate them properly
> keeping in mind UTF-8 sequences and TTY escapes. It also makes the toy
> code a bit more clean.

Do we _need_ to reinvent curses?

>     ah, the second buffer _isn't_ to store color information, it's so you
>     can diff what you think is on the screen (before the xterm was resized?)
>     with what is now on the screen.
> 
> 
> Both buffers contain the whole screen and application always draws to
> the swapped backbuffer.

It's not rendered data, it's raw data you then render. Presumably at
display time you figure out "this sequence would clear the screen if we
printed it out verbatim".

(Oh, the fontmetrics on tab characters are just nuts, you pretty much
have to do that manually.)

Also, for "tee" the following prints out data as it happens:

  while true; do echo -n 'X'; sleep 1; done | tee

But if you | less instead, it doesn't produce any output (on ubuntu
anyway) unless you remove the -n from echo. I was assuming it _did_
produce output, which isn't that hard to implement considering that you
need to read from multiple sources ala poll/select anyway (stdin and
keyboard can asynchronously produce data, and you shouldn't block one
waiting for the other, and thus I had the todo item to genericize the
poll loop from toys/*/netcat.c, which is why my implementation of this
command was sequenced after the pending round of netcat cleanup to make
sure netcat server mode works in nommu mode and that it has proper ipv6
support, and that I implement netcat -u ala
http://sarah.thesharps.us/2009/02/22/debugging-with-printks-over-netconsole/.)

>     So, next function up: scr_update() and THAT calls scr_linediff.
>     Iterating through scr->h. What is scr->h? In lib.h it's "unsigned".
>     Great. Back to look at scr_init()... Ah, it's screen height. Ok...
> 
>     +    char *line = scr_linediff(scr->buf[1] + (scr->l * i),
>     +                              scr->buf[0] + (scr->l * i),
>     +                              scr->l,
>     +                              i + 1);
> 
>     This is the only user of scr_linediff() in the code. Why it's a
>     function, I couldn't tell you.
> 
> 
> It's hinted inline but it could be unrolled into the loop itself. It's
> mostly to keep the logic separate so I could focus on how would I do the
> diff and not look at the loop itself.
> 
> So scr_linediff tries to find the optimal sequence of TTY escapes and
> screen data to update a single line. It isn't complete but it does work
> with ASCII at this point and can draw something like a single or two
> character change anywhere on the screen quite efficiently.

Is this a common case, and if it is shouldn't the command code have a
"jump to this location and update just the row number" function instead
of updating the data then calling something else to figure out how to
efficiently display it?

> When you
> scroll the screen in less/vi it can have a huge impact on the speed if
> you're on 9600 or slower.

Does that come up much these days?

> If you don't see it being useful enough, this can all be dumbed down a
> lot to the full line update. The good thing about the way I did it is it
> makes it dead simple for the toy to update the screen and it's also
> efficient on the TTY but it has the memory impact of double buffering.

Why?

You have a list of lines (linestack) that came from stdin in the case of
less. You're displaying a window into those lines, which has X and Y
coordinates of the upper let corner (controlled by the cursor keys), and
which has a width and a height (controlled by SIGWINCH as you resize the
window, although to make it work a cross a serial port you'd need to
periodically do the ANSI probe, which is why top uses scan_key_getsize().)

when you display a list of lines, you can recreate the data, properly
escaped so you know how much you're writing and don't accidentally
wrap/scroll, with color sequences going through as width 0 and all the
other escapes turned into reverse ^[ and otherwise crunch_str() escaping
as necessary, and tabs calculated based on cursor position (which
crunch_str can't currently do because it doesn't know where on the
screen it's displaying, but I should add an int xstart to that so it can
do that.) So for a "redraw the whole screen" you don't need a buffer,
you just need to generate output.

Resizing the screen is not a common case, so we don't need to optimize
for it. That can be a "clear screen and redraw".

Scrolling up and down one line has existing escape sequences to do that,
so you scroll and then redraw one line, which hexedit.c already does.

And then you get to fun stuff like
https://en.wikipedia.org/wiki/Right-to-left_mark

> The buffer size could be adjusted on-the-fly if someone is trying to
> draw something that fits on the screen but doesn't fit in the buffer but
> it would require a complete memcpy of every old line at that point for
> both buffers.

I don't understand how the buffer helps.

>     Ah, I see! Those for() loops at the start of scr_linediff() are
>     open-coding strnlen(). A non-wide-character version, which will stop at
>     the first zero byte (which wide characters tend to be full of).
> 
> 
> I assumed we worked with UTF-8 which doesn't have null bytes, right?

It doesn't, I thought the *4 was due to your buffer storing wide chars.
My bad.

Rob