[Toybox] [landley/toybox] Simple implementation of less (#39)

Toni Spets toni.spets at iki.fi
Wed Jul 27 21:08:29 PDT 2016


On Thu, Jul 28, 2016 at 1:39 AM, Rob Landley <rob at landley.net> wrote:

> On 07/26/2016 11:42 PM, Toni Spets wrote:
> > On Wed, Jul 27, 2016 at 12:47 AM, Rob Landley <rob at landley.net
> > <mailto:rob at landley.net>> wrote:
> >
> >     On 07/26/2016 02:22 AM, Toni Spets wrote:
> >     > @hifi <https://github.com/hifi> pushed 1 commit.
> >     >
> >     >   * 79ca172 <https://github.com/landley/toybox/commit/79ca172>
> Less
> >     >     magic, more printf
> >
> >     It is so weird that github shows that commit as living in my tree
> >     instead of yours. I'm glad I only use that site for distribution of
> >     masters I keep elsewhere. (Is every commit in the whole of github in
> a
> >     single giant tree with a flat sha1hash collision namespace? I wonder
> how
> >     git catches that, or if the don't care until
> >     http://valerieaurora.org/hash.html goes red?)
> >
> > Well, that's odd.
>
> *shrug* Github.
>
> >     Ok, looking at git diff 7918d9ff8..79ca172eb0d0
> >
> >     The patch adds scr_init() allocating a screen buffer with 4 bytes per
> >     character... not an array of wide characters, but a char * with a
> width
> >     you multiply by 4. Ok? I have no idea how you're going to fit
> combining
> >     characters into there. You also don't appear to be storing color
> >     information, which is something less -R outputs...
> >
> >
> > So the first reasoning was that the longest UTF-8 sequence that
> > apparently is possible is 4 bytes so I threw a dice that at the worst
> > possible case you draw 4 byte sequences all over and it would fit in the
> > buffer. This didn't take into account TTY escapes yet which are in the
> > same buffer as well.
>
> Nope, combining characters stack arbitrarily deeply. My stress test for
> this (tests/files/utf8/test1.txt) is a _silly_ test (cut and pasted from
> twitter), but there are real world examples out there. Certainly more
> than 4 bytes (since utf8 expansion can go to 4 bytes for one unicode
> point anyway, and then unicode points are what combine.)
>
> Combining characters show up as zero length to wcwidth().
>

I removed the assumption in the simplification patch so it fixes this.


>
> >     (So when you scroll the screen back up, you're not just outputting
> ESC[T
> >     and writing a new line of data at the top, but instead doing a
> memmove
> >     on your internal screen buffer? Why?)
> >
> > Simplification of the implementation. The screen is double buffered and
> > the app/toy can update all or only some parts of the backbuffer and
> > doesn't need to care how the update is sent to TTY.
>
> Adding a buffering layer isn't really simplifying. It strikes me as
> premature optimization.
>

It's not only optimization, it's simplification for the toy when it doesn't
need to keep track of what to update.


>
> When you resize a terminal window you get a SIGWINCH which you need to
> respond to by redrawing the screen, which means regenerating the data
> with a new screen width.
>

Is it something a toy needs to support? With the latest patch it's rather
easy to adjust to different screen sizes as the line buffers will
automatically grow.


>
> If your app doesn't know how wide and tall the screen is it doesn't know
> when to stop generating data, even if that data is immediately discarded
> it's still extra work.
>
> >     Next function: the patch adds scr_linediff()... which has no comment
> >     explaining what it does. It's doing a memcmp on an an array of char
> >     pointers... scr_init() allocated exacty 2 char pointers, where does
> this
> >     array with a length come from...
> >
> >     Ok, let's jump down to the last function added to this file and see
> if
> >     it calls the earlier ones... scr_printf() is printing into SCR_PTR().
> >     What's SCR_PTR? It's added to lib.h as
> >
> >       #define SCR_PTR(scr) (scr->buf[0])
> >
> >     And this is the only user of that macro anywhere in the code so why
> is
> >     it a macro?
> >
> > The idea was that a toy can access the current backbuffer with SCR_PTR
> > if it wants to draw things directly instead of scr_printf(). Accessing
> > it as scr->buf[0] is an ugly implementation details so I hid it.
>
> Is it abstracted out so the command doesn't have to deal with it, or is
> it something the command directly deals with?
>
> Current is always 0?
>

Removed as of latest patch. A toy doesn't need any other entry point to
screen buffer than scr_printf().


>
> Toybox usually gets small and simple code via tight integration, taking
> advantage of all the implementation details being available locally. If
> we change something in lib we can change all the users of it at once.
>

There will be multiple users for screen management so providing an API
that's easy and short to use fits in that category in my opinion.


>
> >     If you try to write to >x or >y it returns with no error. Otherwise,
> >     this function is a wrapper around:
> >
> >       vsnprintf(SCR_PTR(scr) + (scr->l * y), scr->l - x, fmt, va);
> >
> >     That finds the start of the y line (scr->l is scr->w*4 and I don't
> know
> >     why you needed to save that to avoid a <<2 in the users), does _not_
> >     advance x bytes (or bytes*4) into it, but truncates what we're
> printing
> >     to what presumably _would_ be the end of the line if we had advanced.
> >
> >     Again, our screen width is *4 but we're not printing wide characters
> >     here. (Did you mean vswprintf?)
> >
> > It's still incomplete but yes, the only point of it is to safely wrap
> > drawing lines on the screen so that it will truncate them properly
> > keeping in mind UTF-8 sequences and TTY escapes. It also makes the toy
> > code a bit more clean.
>
> Do we _need_ to reinvent curses?
>

Well, that's really up to you. My approach was to do this simple screen
double buffering to avoid having to deal with it in the toy code allowing
less and vi share it. Curses is still much more complicated than that. Toys
still do cursor management but not screen management.


>
> >     ah, the second buffer _isn't_ to store color information, it's so you
> >     can diff what you think is on the screen (before the xterm was
> resized?)
> >     with what is now on the screen.
> >
> >
> > Both buffers contain the whole screen and application always draws to
> > the swapped backbuffer.
>
> It's not rendered data, it's raw data you then render. Presumably at
> display time you figure out "this sequence would clear the screen if we
> printed it out verbatim".
>

The toy would clear the screen by printing an empty string to all visible
lines. This could be simplified by having a function for it that quickly
sets the first byte to zero of all back buffer rows.


>
> (Oh, the fontmetrics on tab characters are just nuts, you pretty much
> have to do that manually.)
>
> Also, for "tee" the following prints out data as it happens:
>
>   while true; do echo -n 'X'; sleep 1; done | tee
>
> But if you | less instead, it doesn't produce any output (on ubuntu
> anyway) unless you remove the -n from echo. I was assuming it _did_
> produce output, which isn't that hard to implement considering that you
> need to read from multiple sources ala poll/select anyway (stdin and
> keyboard can asynchronously produce data, and you shouldn't block one
> waiting for the other, and thus I had the todo item to genericize the
> poll loop from toys/*/netcat.c, which is why my implementation of this
> command was sequenced after the pending round of netcat cleanup to make
> sure netcat server mode works in nommu mode and that it has proper ipv6
> support, and that I implement netcat -u ala
>
> http://sarah.thesharps.us/2009/02/22/debugging-with-printks-over-netconsole/
> .)
>

How is this related to updating the screen? Isn't this more of a linestack
issue?


>
> >     So, next function up: scr_update() and THAT calls scr_linediff.
> >     Iterating through scr->h. What is scr->h? In lib.h it's "unsigned".
> >     Great. Back to look at scr_init()... Ah, it's screen height. Ok...
> >
> >     +    char *line = scr_linediff(scr->buf[1] + (scr->l * i),
> >     +                              scr->buf[0] + (scr->l * i),
> >     +                              scr->l,
> >     +                              i + 1);
> >
> >     This is the only user of scr_linediff() in the code. Why it's a
> >     function, I couldn't tell you.
> >
> >
> > It's hinted inline but it could be unrolled into the loop itself. It's
> > mostly to keep the logic separate so I could focus on how would I do the
> > diff and not look at the loop itself.
> >
> > So scr_linediff tries to find the optimal sequence of TTY escapes and
> > screen data to update a single line. It isn't complete but it does work
> > with ASCII at this point and can draw something like a single or two
> > character change anywhere on the screen quite efficiently.
>
> Is this a common case, and if it is shouldn't the command code have a
> "jump to this location and update just the row number" function instead
> of updating the data then calling something else to figure out how to
> efficiently display it?
>

It was an inline function but it has been removed in the latest patch and
only full line updates are done. It's good enough on 9600.


>
> > When you
> > scroll the screen in less/vi it can have a huge impact on the speed if
> > you're on 9600 or slower.
>
> Does that come up much these days?
>

Probably not, hence removed that complication.


>
> > If you don't see it being useful enough, this can all be dumbed down a
> > lot to the full line update. The good thing about the way I did it is it
> > makes it dead simple for the toy to update the screen and it's also
> > efficient on the TTY but it has the memory impact of double buffering.
>
> Why?
>
> You have a list of lines (linestack) that came from stdin in the case of
> less. You're displaying a window into those lines, which has X and Y
> coordinates of the upper let corner (controlled by the cursor keys), and
> which has a width and a height (controlled by SIGWINCH as you resize the
> window, although to make it work a cross a serial port you'd need to
> periodically do the ANSI probe, which is why top uses scan_key_getsize().)
>
> when you display a list of lines, you can recreate the data, properly
> escaped so you know how much you're writing and don't accidentally
> wrap/scroll, with color sequences going through as width 0 and all the
> other escapes turned into reverse ^[ and otherwise crunch_str() escaping
> as necessary, and tabs calculated based on cursor position (which
> crunch_str can't currently do because it doesn't know where on the
> screen it's displaying, but I should add an int xstart to that so it can
> do that.) So for a "redraw the whole screen" you don't need a buffer,
> you just need to generate output.
>

I did think about this. We're not drawing just the line stack always.
There's a status line at the bottom in many cases and you need to manually
manage that last line anyway if you try to just work with linestack only.
Even if linestack has more information about how to draw the lines on the
screen.

If you recreate/duplicate the linestack for display, wouldn't that have the
same-ish memory impact anyway as you need to keep the original linestack as
well if you want to save the data back to disk. At best having the double
buffering I'm doing is using less memory than duplicating the whole line
stack if it's longer than two screenfuls of buffers.


>
> Resizing the screen is not a common case, so we don't need to optimize
> for it. That can be a "clear screen and redraw".
>
> Scrolling up and down one line has existing escape sequences to do that,
> so you scroll and then redraw one line, which hexedit.c already does.
>

That is an optimization even scr_update() could do if implemented (compare
front and back buffers to detect scrolling). It would be more complicated
than what hexedit does, granted.


>
> And then you get to fun stuff like
> https://en.wikipedia.org/wiki/Right-to-left_mark
>
> > The buffer size could be adjusted on-the-fly if someone is trying to
> > draw something that fits on the screen but doesn't fit in the buffer but
> > it would require a complete memcpy of every old line at that point for
> > both buffers.
>
> I don't understand how the buffer helps.
>

It takes the requirement of the toy to know about how to update the screen
and keep it optimized enough. If you take a look at the less code the only
thing it does after every key press is update the buffer and scr_update()
will take care of updating the TTY only when required and that includes the
status line. It's a small sacrifice of CPU power to compare a few bytes in
memory.


>
> >     Ah, I see! Those for() loops at the start of scr_linediff() are
> >     open-coding strnlen(). A non-wide-character version, which will stop
> at
> >     the first zero byte (which wide characters tend to be full of).
> >
> >
> > I assumed we worked with UTF-8 which doesn't have null bytes, right?
>
> It doesn't, I thought the *4 was due to your buffer storing wide chars.
> My bad.
>
> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
>



-- 
Toni Spets
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20160728/7b83086d/attachment-0003.htm>


More information about the Toybox mailing list