[Toybox] [PATCH] hexedit: various improvements.

Rob Landley rob at landley.net
Sat Apr 24 02:07:10 PDT 2021


Sigh, my laptop decided to power off instead of suspending again, and I lost my
open windows. Second attempt at replying to this, and I may have missed some
other stuff too...

On 4/22/21 8:17 PM, enh wrote:
> On Tue, Apr 20, 2021 at 2:11 AM Rob Landley <rob at landley.net
> <mailto:rob at landley.net>> wrote:
>     > The main "TODO" in this is that I never got round to implementing
>     > searching for an arbitrary byte sequence. It seems like we ought to have
>     > that feature, but personally I'm far more likely to jump to an offset or
>     > to search for some ASCII. I haven't needed to search for arbitrary byte
>     > sequences in all this time, so I'll fix this if/when I actually need
>     > it...
> 
>     I might take a stab at it, but don't let that "maybe" stop you if you feel
>     inspired.
> 
> as so often, i think the hard part is agreeing what the interface should be :-(

Code is usually easily replaced. The design is the part that's work.

>     > * The ASCII pane is made more readable by (hopefully) reasonable use of
>     >   color.
> 
>     When editing a primarily text file (which comes up a lot when I'm trying to
>     figure out if something is a space or a tab, for example), the red J for
>     newlines are REALLY ugly to me.
> 
>     The commodore 64 (yes I am that old) used to naturally display nonprintable
>     characters in reverse video when they were quoted, which is what I implemented.
>     It's quite likely I'm biased by that, but it makes each displayed character
>     unique.
> 
>     (The red you've chosen is also _dark_, which makes it harder for me to see...)
> 
> that's unfortunate --- it sounds like we have diametrically opposed aims with
> the ASCII panel.

I spent rather a lot of my teenage years reverse engineering various game save
file formats. (You can call it cheating, I say I was using a higher form of
magic than the game makes readily available to most players.)

Thus I have an unusual amount of familiarity with certain display formats, and
am (as I said) biased. :)

> i specifically wanted non-ASCII to disappear into the
> background. i think i use the ASCII pane either for reading string tables and
> the like (where a really clear "these are the strings"/"these are the bytes
> between" delineation is helpful), or for navigating, where it's kind of a table
> of contents (and having the actual text [typically a filename in some kind of
> archive file] stand out is again helpful).
> 
> (and for the special case of text files, i also thought the red Js clearly
> demarcating lines was a nice feature :-( )

I think we need a modal display. It can start with your version and I hit a key
and toggle to my preferred version.

Aesthetic issues do not HAVE a "correct" solution. I have a standard rant on the
subject:

  https://landley.net/notes-2010.html#13-08-2010

There's existing _expectations_, which you can get by looking at adjacent tools
such as "hd" in this case:

$ hd hexedit | head -n 4
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  03 00 3e 00 01 00 00 00  90 1f 00 00 00 00 00 00  |..>.............|
00000020  40 00 00 00 00 00 00 00  98 63 00 00 00 00 00 00  |@........c......|
00000030  00 00 00 00 40 00 38 00  09 00 40 00 1e 00 1d 00  |.... at .8...@.....|

That's the classic "lossy" display, but it's a bit _too_ lossy and both of us
are trying to go beyond it. (Plus the gap between the 8 and 8 is... something I
could have implemented, and may have at one point, but took out again? I forget,
it's been a while...)

>     > Regular control characters are shown in red using the
>     >   appropriate letter (so a red A is 0x01, etc),
> 
>     Sigh. Not an aesthetic decision that appeals to me, but bikeshedding over color
>     is not a clear win for anybody.
> 
> an alternative i had was to use the Unicode control pictures (U+2400 and up) to
> show something like "␡ELF␂␁␁␀␀␀␀␀␀␀␀␀␃␀>␀␁␀␀" (that's the start of a random ELF
> file) but unless you have really large fonts it's unreadable, and it only covers
> the control characters, not the top bit set bytes.

Indeed. I thought about having a utf-8 display mode but 2 utf-8 bytes stores 11
bits which is 3 hex digits and there's no sane encoding that's going to fit a
U+12F in two digits.

I also thought of displaying unicode but combining characters are terrible. (WHY
aren't they prefixes? Suffix means you redraw the same character multiple times
and never know when you're done. UTF-8 is lovely. Unicode very much is not.)

>     > printable characters are
>     >   shown normally, and top-bit set characters are just shown as a purple
>     >   question mark (since I couldn't come up with a better representation
>     >   that had any obvious value
> 
>     The previous representation had a unique display for each character.
> 
> i know, but like i said, to me that's negative value --- it's very visually
> distracting, and i don't know what i can _do_ with the extra information. if i
> want to see non-text bytes, there's the hex view for that.
> 
> we could always have a --color, or use a key to toggle while it's running? (if
> there are people who want _both_ styles at different times.)

Something like that seems our best option, yes.

>     > --- in my experience top-bit set characters
>     >   are either meaningless in ASCII, part of a UTF-8 sequence in modern
>     >   files, or in some random code page in ancient files).
> 
>     It wasn't "correct", it was "unique".
...
>     I understand you want to use foreground color instead and reserve "reversed" for
>     the cursor. I was reluctant to _require_ color because I dunno what variants of
>     "colorblind" actually come up these days. (Printing things out on paper is still
>     usually black and white but I dunno how relevant that is. Actual non-color
>     displays aren't as big a deal as they used to be...)
> 
> yeah, it's also annoying that no terminals support the old sequences for
> querying the foreground and background colors any more, so you can't even
> auto-detect dark-on-light vs light-on-dark, let alone know what the actual
> colors are.

ANSI escape codes are ubiquitously supported because there was a DOS driver for
them as part of the base OS. Anything that wasn't in that DOS driver, I look
askance at.

>     Still not entirely happy with it though. :(
> 
>     > The choice of
>     >   red and purple was to deliberately make these not-actually-ASCII
>     >   characters slide into the background; before this patch they have so
>     >   many bright pixels (especially with the use of reverse video) that I
>     >   couldn't clearly see the *actual* ASCII content in the ASCII pane.
> 
>     I wanted them to stand out when looking at a mostly ascii file. 
> 
> ah, interesting. how about
> 
> diff --git a/toys/other/hexedit.c b/toys/other/hexedit.c
> index 398ec15d..e6f94bc1 100644
> --- a/toys/other/hexedit.c
> +++ b/toys/other/hexedit.c
> @@ -89,7 +89,7 @@ static void draw_char(int ch)
>  {
>    if (ch >= ' ' && ch < 0x7f) putchar(ch);
>    else {
> -    if (ch < ' ') printf("\e[31m%c", ch + '@');
> +    if (ch < ' ') printf("\e[1;31m%c", ch + '@');
>      else printf("\e[35m?");
>    }
>    printf("\e[0m");
> 
> is bold red bright enough for you? (it's not too bright to be distracting for
> me.) what terminal and color scheme are you using? (though, like you say, any
> time you have colors in a program you probably need to assume that sooner or
> later you end up needing to implement the equivalent of $LS_COLORS :-( )

No, I don't want to wind up with "beige". This is ANOTHER thing I've ranted
about in various talks, but here's a good thread about "emergent properties in
decision making":

  https://fadeverb.tumblr.com/post/647824550636240896

and this thread calls it the "ice cream problem":

  https://twitter.com/Coelasquid/status/1385432426889900038

"Good" != "unobjectionable". Compromise on aesthetic issues can easily remove
_objections_ but that's not necessarily an improvement. "The nail that sticks up
gets hammered down" guarantees nothing interesting will be accomplished. It's
far too easy to make the "spherical frictionless cow" mistake and I don't want
to even START down that path here.

Your interface is what YOU like. My interface was what I liked. We were pursuing
different goals. Mine was a unique representation that works in black and white
and conforms (roughly) to a specific historical model. That doesn't make it
BETTER because you weren't pursuing those specific goals. You have other goals,
and your display presumably meets those goals.

I agree a modal approach that switches between them is probably the way to go.

>     When dealing
>     with a significantly binary file I was mostly off looking at the hex side of the
>     force.
> 
>     (I grew up with petascii representation so it's a native tongue to me. I admit
>     to being biased here. But your chosen representation... has its own issues.)
> 
>     > * Addresses are now shown in yellow. No real justification other than "it
>     >   looks nice".
> 
>     Define "yellow". (It's brown here. Missing a "bright" escape maybe?)
> 
> sadly "yellow" is one of the most variable colors amongst terminals:
> https://en.wikipedia.org/wiki/ANSI_escape_code#Colors
>
> here's the whole thing emboldened, which is a bit "in your face" for me, but
> fine if it's better for you (this is instead of, rather than on top of, the
> previous patch):

Nope, aesthetic issues need an author. You are the author of your interface,
it's your vision of what looks good to YOU.

Open source distributed development only ever bypassed the "too many cooks spoil
the soup" problem for issues where empirical tests let you pursue a measurable
local peak. They have NEVER addressed aesthetic issues, for the same reason
wikipedia can't write a novel. Hollywood keeps trying to focus group the perfect
script and they only ever come up with "unobjectionable", not "good".

Too many cooks spoil the TASTE of the soup. Always have, still do. (Open source
focused on nutrition and churned out kibble which tasted as little as possible.)

> but I can understand if this one's controversial.
> 
>     I don't think we're going to get a representation that satisfies everybody.
>     Possibly there should be a command line flag or something?
>  
> or the ls-style environment variable?

How the default is selected is less relevant when there's a key that toggles
easily between them.

> (or, even more generally, i've been wondering whether toybox should have some
> generic $TOYBOX_<toy>_FLAGS type of thing, rather than the ad hoc set that we
> see with GNU grep and ls [but not most other things]. of course, with my
> "hermetic build" hat on, i shiver at that thought.)

I've thought about it too, but wouldn't want to depend on it? (And sadly, an
ad-hoc environment variable set is probably as much of an established user
interface as the ad-hock command line argument selection...)

Toybox sort of has policy here in https://landley.net/toybox/design.html
although it's scattered: "Features" talks about circular dependencies in system
building, "Simplicity" talks about environment dependencies as a form of
complexity to be avoided, and the "Shared Libraries" section talks about
external dependencies in general. As with vogon grandmothers, "in brief: avoid".

Also, I point out devuan's bash start files have:

  $ alias
  alias ls='ls --color=auto'

set by default, and that seems like a reasonably clean solution to this class of
problem? (I'm most of the way through adding function support to toysh, and
alias support isn't far behind.)

Toybox currently checks the following environment variables:

$ grep -ho 'getenv("[^"]*")' toys/*/*.c | sort -u | sed 's/.*"\(.*\)")/\1/' | xargs
_ ACTION BC_LINE_LENGTH bootchart_init console CONSOLE DEVNAME DEVPATH EDITOR
FSCK_MAX_INST FSTAB_FILE HOME MAJOR MANPATH MINOR MODALIAS PATH POSIXLY_CORRECT
PWD SHELL SUBSYSTEM sushell SUSHELL TERM TMPDIR TZ VISUAL WRAPLOG

And in fact if you chop out "pending" and "example", it's just...

$ grep -ho 'getenv("[^"]*")' toys/{android,lsb,net,other,posix}/*.c | sort -u |
sed 's/.*"\(.*\)")/\1/' | xargs
HOME PATH PWD TMPDIR TZ

Doesn't seem to justify a new mechanism yet?

In _this_ case, hitting a key to toggle character representation isn't a high
bar for me. Android is the larger userbase, so even "defaults to your son
android and mine on not android, then hit the key to toggle" doesn't make sense:
behave consistently. As long as my view is _available_ I'm happy.

>     > * Errors are shown "vim style" in bold white text on a red background,
>     >   waiting briefly to ensure they're seen.
> 
>     A bit _too_ briefly for my tastes. Can we wait until they hit a key, maybe?
> 
> makes sense. will do. (i'm assuming we'll want something like this to share with
> vi eventually.)

Another reason I haven't poked much back at this. I need to fix up the tty raw
mode code (which as I think I've mentioned here is wrong and causing microcom
bugs on an actual serial port), and then wrapping my head around the vi code
that's there is likely to be a whole learning experience for me...

>     > * The status bar shows the filename, whether the file is opened
>     >   read-only, the current offset into the file, and the total
>     >   length of the file.
> 
>     Way back when, I used the hexedit filename display line to test the utf8 string
>     measuring plumbing I was creating in lib/tty.c (before doing tar), I wonder if
>     it's still getting that right...
> 
>       $ mkdir sub5
>       $ echo hello world > sub5/"$(cat tests/files/utf8/arabic.txt)"
>       $ ./hexedit sub5/*
> 
>     Not... exactly. Could be worse, though.
> 
> 
> yeah, another thing we'll want to share with vi.

I need to figure out how to test this in an automated fashion. This is actually
one of the reasons I wanted to write a "screen" implementation, I need most of
that plumbing to implement a tty-aware expect.

(My todo list isn't fractal but it DOES have geological strata.)

Rob

P.S. Checking my todo list for this one, "undo support" is probably the most
useful thing other than being able to type into the ascii side, but undo
requires "redo" in order for undo itself to not be the most dangerous operation
you can perform and that was enough of a design rough edge needing pondering to
keep me from just dashing it off...

P.P.S. Ctrl-Alt toggling between ascii edit and hex edit modes doesn't go
through ssh or serial terminals. I was going "this is why function keys were
invented" but the various terminal programs INTERCEPT the function keys and make
them do INCREDIBLY STUPID things (like F1 popping up a "do you want to read the
xfce terminal manual offline" window which I NEVER HAVE AND NEVER WILL), so
programs can't reliably use them which means nothing on linux ever uses function
keys because terminal emulators render them useless, a fact about which I am sad.

P.P.P.S. I remind you of the "demo_scankey" command for working out what does
and doesn't go through various terminal thingies. I haven't played with adb much
at all. It exits for any nonprintable key, but shows it to you on the way out...



More information about the Toybox mailing list