[Toybox] Generic editor. Was: fold implementation

David Seikel onefang at gmail.com
Tue Apr 8 09:45:55 PDT 2014


I'll clean it up, taking your comments into consideration, next time
I have a block of time to dedicate to it.

On Tue, 08 Apr 2014 06:15:59 -0500 Rob Landley <rob at landley.net> wrote:

> I've actually looked at it a couple times, the problem is I'm not
> entirely sure what to do with it.

Drop it into pending, so we can all work on cleaning it up and abusing
it to see what breaks.  Then I can feed the next bite sized bit to you.
That would be my hope.  B-)

> In handlekeys: half of this is a big table mapping input sequences to
> names. I'm not sure I'm happy with the strings as the output, but
> enums wouldn't be an improvement so let's skip that for now.

Those strings are easy to use names for other users of this stuff.
Sure they could be C or pre processor names, but that makes it hard to
use in some sort of external file thingy mapping keys to commands that I
think you suggested long ago.

Advanced editors allow arbitrary mappings between keystrokes and
commands.  In this case having strings would be the way to do it.
Perhaps we wont ever get as far as supporting such advanced editing
features, perhaps not.  My study of the commands that are on the toybox
roadmap that might use this broke things down into "dead simple
commands that get something barely usable", "basic commands", "advanced
commands", and "code editor specific stuff".  My intention is to
implement things in that order.  At some point along that progression
we might decide "this is where we stop".  As you keep saying, a
stopping point is good.  So far, other than "lets not write an entire
operating system like Emacs", no stopping point has been decided.
"Dead simple commands" was my stopping point for my first code drop.

At any rate, the important thing is to have simple keystroke names
instead of numbers others have to figure out for themselves.  Still,
that's a couple of reasons why strings is better, so I went with
strings.

> The first half of this table is matching 00 through 1F in order, I'm
> not sure the first entry of each pair is necessary? The rest of the
> table is matching escape sequences starting with either ESC (27) or
> 0x9b (which it says is the 8-bit encoding of ESC? What?) A bit
> redundant, but let's worry about cleaning it up later...

0x9b is officially called CSI, it's the eight bit encoding of "Esc [",
not "Esc".  You missed the "[".  7 bit terminals should use "Esc [",
8 bit terminals should use CSI.  Even though in our modern world of
pretend terminals 7 bit ones should be rare, in practice I've seen both
the 7 and 8 bit versions sent.  Not really redundant, as not all
sequences can start with both, but lots do.

Mapping input bytes to keys is a big mess, several standards, several
non standards, some terminals do their own thing.  This entire table is
the "simple" way to do things.  Sure the code could get more complex to
cut the table up into smaller bits, but we are aiming for simple, so
that's the way I did it.  Some of this table came from standards, some
came from me starting up several terminal programs and hitting keys to
see what they sent.

For the same reason the CSI encoded keystrokes are in that table.  Sure
the CSI code is a separate block of code, but it would still need it's
own table like this to deal with the keystroke mappings.  So simpler
just to have all the keystrokes in one big table, and check it before
checking the other CSI stuff.  Which also cleanly separates CSI
keystrokes from CSI terminal reports.

Sure, sending a few simple ANSI sequences to terminals these days means
we don't need big fancy terminal databases.  This big table tries to
deal with the other direction, various terminals send varying things
for the keystrokes we care about.  So one big translation table like
this should catch most of them.  Yes, there are some conflicts, not
sure yet how to handle those, but at least the ones I know about are
documented.

> It defines static variables outside of a GLOBALS() block.

GLOBALS is meant for toys only?  Handlekeys is meant for the library.
Dumbsh is the example toy, and it keeps static variables in GLOBALS.

> handlekeys():
> It says FDZERO() each time is "more portable"... to what?

The resetting of selectFds each time through the select loop is
claimed to be more portable than not resetting it.  Can't recall where
I found that claim, this bit of code I cut and pasted from a much older
project.  It's entirely possible this portability is to systems we don't
care about.  Gotta zero them out before first use anyway, so that's
just moving them outside of the loop.  Not a performance critical loop,
so I went with "portable" over "fast" this time.

> We have xread(), might as well use it.

True.  In this case I'm mostly interested in some debugging feedback to
see what else had been passed in.  It was useful for development, could
be removed.  There might be other library stuff we can use, I don't
know everything toybox has in it's library.

> In the "Ran out of buffer" case, you're assuming the last byte is
> zero, but where do you actually set that?

That's a bug, good catch.

> This function calls two callback functions, when it could as easily
> fill out a struct and/or return a status? Haven't quite wrapped my
> head around why you did it that way yet, but still reading.

One is handling keystrokes, the other is handling reports from the
terminal that are not keystrokes.  So the context and contract are
different.  In the case of keystrokes, you get a chance to say "I don't
know this keystroke sequence, but it might be the first part of
something I could understand later, so keep accumulating keystrokes",
"nothing to do with me, drop it", or "I handled it, move on".

In the other case, the reports from the terminal are atomic, and not
expected to be part of a sequence of keystrokes.  So no need to tell
handlekeys to continue to accumulate it.  The callback either does
something in response to the terminal report or entirely ignores it.
Also, keystrokes don't have internal data, CSI reports do.  Basically,
they are different things.

This is explained a bit in handlekeys.h, I could explain it better.

Might be a mouse callback in the future to.  Or possible pass a
structure around, with a "this is X type callback" member, with a
returned status as you said.  At some point yes, simpler would be to
pass a structure than keep adding more callbacks.

> RE: sizeof(blah)/sizeof(*blah), we have ARRAY_LEN(), might as well
> use it.

I didn't know about that macro.  Good idea.

> We care that buffer[] is null terminated _and_ we have buffindex?

Yes.  As you mentioned before, sometimes buffer is passed to stuff that
needs null terminated strings, and we need to know where the end is.
So we could do a strlen() every now and then, or just keep track of it
in buffindex.  An earlier version did more with buffindex, this
simplified version just kept that around rather than convert to strlen.

> Why have pendingESC and csi, rather set "pending" to 27 or 0x9b?
> (pendingESC is outside the loop, csi is in the loop...)

pendingESC is about checking if the escape key was hit, with a delay to
see if it was actually an escape key, it's in the loop.  The CSI
sequence "Esc [" is not "an actual Esc key was hit".  Yes, possibly
could reuse a variable for them both, but they are different things,
and that might be harder to understand.  Simple wins.

> When I rewrote the busybox vi escape parsing to not freak out over a
> serial terminal, a bare escape wasn't special any more than any
> unterminated sequence is special. If we have enough of a delay in the
> middle of a sequence, degrade it to literals. Otherwise you hang who
> knows how long in an indeterminite state halfway through input that
> hasn't come. (Locally generated sequences come in as a single read(),
> and sending them via ssh puts them in the same network packet, they
> only really get broken up by serial hardware and that should have
> very hard timeouts. 300 baud sent a character every 1/7th of a
> second.)

Yep, that's what pendingESC is all about, checking for that delay.  A
completely arbitrary one tenth of a second delay is used.  I've not
actually tested it on a 300 baud modem, I've never owned one.  I don't
even own a real phone line (or phone), even though the monopoly
communications infrastructure provider here insists on charging me an
extra $30 per month for one on the Internet only fibre they installed.
But that's another story.

Actually testing it over a really slow serial port should be done, I've
not done that yet.

> At the start of if (csi) there's a large comment that basically says
> "go read the spec and understand this yourself".

That large comment tries to distil into a short table the actual bits we
need to know from the unreadable spec.  That short table is more
readable than ECMA-048 section 5.2, but perhaps I could make it more
readable still?  In the end though, mostly it was notes to myself about
what the spec says that is important to what I was writing.

> Mouse report? Um, there can always be escape sequences we don't
> understand. The proper thing to do is probably pass 'em through
> verbatim.

The problem here is that there are many different formats for the mouse
report, which is the only CSI report I can foresee actually being used
that doesn't have a well defined "this is the end of the report".
Knowing where the end of the CSI report and sending the entire report
to the callback is what the CSI part of the code is all about.  The
ECMA-048 spec defines "this is the end" quite well, which some mouse
reports break.

Actually, I think no other CSI style report falls outside of the spec
like that.  You can tell where the end of the mouse report is, if you
know how mouse reporting was requested.  Mouse support is something I
want, but I have not figured out which of the equally horrid terminal
mouse systems I want to use.  So for now, this is a TODO, and a note
that it's outside the spec.

Generally handlekeys is trying to be a higher contextual layer that
turns "random bytes from the terminal" into "here's a human
readable version for you".  So far the only ones of interest I
have seen are keystrokes, CSI that matches ECMA-048 section 5.2, and
mouse reports.  The keystrokes that use CSI match ECMA-048, the ones
that dont use CSI are caught by the table.  So it's the mouse reports
that are gratuitously different, they start as CSI, but don't follow
the spec for ending like CSI.  So they trigger the CSI code, but
dealing with them is left for a later coding effort.

As mentioned before, the two results from handlekeys are "here's some
keys" and "here's a valid CSI report".  "Here's a mouse report" is on
the road map.  "Here's random line noise, or perhaps something I don't
understand" is not considered here.  I've looked at lots of "this is
the sorts of things you might see go back and forth between a terminal
and an application".  I've not seen anything that falls outside of
keystroke, CSI report, or mouse report that looked remotely interesting
for the use cases we are interested in.  If any turn up, then that's a
bug to be fixed.  Actual line noise is likely to eventually fill up the
buffer, then get dropped.  Sure there might be a sequence we support
mixed with the line noise, but we are just gonna ignore it until we see
something we understand.

I should probably put some of the last couple of paragraphs in the
documentation in handlekeys.h.

On the other hand, the only CSI we are using (apart from keystrokes) is
the terminal size thing.  That's really the only reason it's there.

> In dumbsh: the tty reset code is conceptually similar to the
> sane_tty() stuff in init, I'd like to factor it out into a generic
> terminal reset thing that reset() could also use, and maybe stty. But
> there are a gazillion fiddly flags and I don't know enough about this
> area yet to make it simple. I suspect I need to write an stty
> implementation first, then retrofit calls to that code into init and
> such.

Yep, real tty reset code that is generic would be good.  As mentioned
in a comment there "real code might not want this".  dumbsh isn't a real
shell, it's an example toy showing how handlekeys could be used for a
real shell, with the actual shellness and other fancy stuff left as an
exercise for the reader.  Bite sized bit.  B-)

Yep, a gazillion fiddly flags, mostly with really hard to read names.
It was outside the scope of the handlekeys bite sized bit, and dumbsh
just needed something basic for the example to work.

-- 
A big old stinking pile of genius that no one wants
coz there are too many silver coated monkeys in the world.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20140409/178ff8ea/attachment-0002.sig>


More information about the Toybox mailing list