<div dir="ltr">


<p class="">If this message isn't saving you time researching details, feel free to skip it. It's just references, details, and background that might be useful to someone, or might not.</p><p class="">In the context of terminal control, "8-bit encoding" is a bit misleading, especially if you're googling for information. The pre-UCS ISO/IEC 2022 character-set extension architecture is actually related, but most of the information you find that way will be irrelevant. In the context of terminal control, the relevant search term seems to be (8-bit terminal control characters) or (8-bit control sequences). The 8-bit control sequences are actually related to ISO 2022 (and the term 'C1 codes' derives from this) ... but the documents talking about character-set selection rarely mention anything about terminal control.<br>

</p><p class="">Incidentally, the ISO 2022 standard is also how character-set overlaying was managed in the VT-series terminals. A good example can be found at <a href="https://en.wikipedia.org/wiki/ISO/IEC_2022#ISO.2FIEC_2022_character_sets">https://en.wikipedia.org/wiki/ISO/IEC_2022#ISO.2FIEC_2022_character_sets</a> (the rest of the article is less relevant to terminals). There were escape sequences to load one of several alternate C0 sets and escape sequences to load one of several alternate C1 sets. These might change the byte-to-character interpretation, as well as change the symbols displayed for a given codepoint. Hard terminals only usually had a couple of these built in as a build-option, and soft-terminals often don't bother supporting them at all, especially if they're trying for UTF-8 compatibility.</p>

<p class="">Certain codes in the C0 range were reserved for control codes (tab, backspace, carriage return, newline, etc). With only one or two exceptions, the C1 codes were not reserved by standard in the same way. However, there were several codes reserved in the C1 range by common usage for terminal-control purposes.</p>

<p class="">The xterm control sequence documentation specifies the 8-bit C1 codes as alternate single-byte codes between 0x84 and 0x9f (within the ISO 2022 C1 control plane)  that are equivalent to certain two-byte codes beginning with ESC. Other sources vary slightly, but this source (<a href="http://rtfm.etla.org/xterm/ctlseq.html">http://rtfm.etla.org/xterm/ctlseq.html</a>) seems to be the closest to a common superset definition that I've found, with most terminals being more similar to its documented expectations than they are similar to each other. </p>

<div>Xterm, dtterm, various VT-series, and several less-common terminals can emit these. In the case of xterm, 8-bit control sequences (or at least xterm's emission of such) is controlled by an option which defaults to generating 7-bit control sequences rather than C1 codes. It is(was?) fairly common for terminal emulators to accept both 7-bit and 8-bit alternate encodings, while only emitting one preferred encoding (usually the 7-bit one).<br>

</div><p class=""><br></p><p class="">Support for 8-bit C1 codes is mostly incompatible with UTF-8, since it is ambiguous in any given environment whether the terminal stream is supposed to be interpreted as an 8-bit stream of byte-sequence control codes interspersed with text interpreted as UTF-8, or a UTF-8 stream of mixed control-codes and text. The first was once considered more logical and was more common, but since it is  incompatible with using UTF-aware stream apis, it has become rare. The second is merely complex and inefficient, and not inter-compatible with the first.<br>

</p><p class="">Meta-character user-input handling has a similar but unrelated variation:  Assume the user hits meta(or alt)-C. This can be encoded by the terminal as the two bytes ESC,'c' , or it can be encoded as the single byte 0xe3 ( ascii 'c' | 2**7 ). This detail may follow the terminal settings for 8-bit terminal-control characters, or it may not. My experience is that it usually does, unless the terminal being emulated only had one defined meta-character mode.</p>

<p class="">There are significant issues involved in trying to support 8-bit terminal control sequences, 8-bit meta-character sequences and UTF-8. Apparently, due to the move to support UTF on the console, the linux console driver no longer supports the 8-bit terminal-control sequences. man 4 console_codes (or <span class=""><a href="http://linux.die.net/man/4/console_codes">http://linux.die.net/man/4/console_codes</a>) documents this in the Bugs section at the end.</span></p>

<p class="">The modern version of vttest serves as both a good validation tool and a good documentation-via-code of both the "expected" behavior and the most common variants... it's widely packaged, but the main homepage does have additional useful information (and further references): <a href="http://invisible-island.net/vttest/vttest.html">http://invisible-island.net/vttest/vttest.html</a> It does include keyboard/input tests as well as terminal-control tests, so there should be bits of relevance to an editor.</p>

<p class=""><br></p><p class="">Hope this is useful, or at least not annoyingly redundant.<br></p></div>