[Toybox] [PATCH] hexedit: various improvements.

enh enh at google.com
Thu Apr 22 18:17:56 PDT 2021


On Tue, Apr 20, 2021 at 2:11 AM Rob Landley <rob at landley.net> wrote:

> On 4/19/21 4:22 PM, enh via Toybox wrote:
> > I've been using hexedit quite a lot, mainly for _corrupting_ files, and
> > have been meaning to send this collection of changes for far too long
> > now. I saw a bug requesting editing in the ASCII pane (which this patch
> > _doesn't_ add), and wanted to get this sent in before it has to undergo
> > the third massive merge conflict of its existence...
>
> You can tell I never did the "remove sleep deprivation artifacts"
> polishing pass
> on this command, it still had "char broiled;" in it...
>
> This is more changes than I have time to go over with a fine tooth comb
> right
> now, but A) I trust you, B) you're the one driving new feature additions.
> (I was
> happy with the old one, you're not, so I should get out of the way.)
>
> Applied.
>
> > The main "TODO" in this is that I never got round to implementing
> > searching for an arbitrary byte sequence. It seems like we ought to have
> > that feature, but personally I'm far more likely to jump to an offset or
> > to search for some ASCII. I haven't needed to search for arbitrary byte
> > sequences in all this time, so I'll fix this if/when I actually need
> > it...
>
> I might take a stab at it, but don't let that "maybe" stop you if you feel
> inspired.
>

as so often, i think the hard part is agreeing what the interface should be
:-(


> > * Enter (new) read-only mode rather than refusing to open read-only
> >   files.
> >
> > * More keys: page up/page down, home/end, and ctrl-home/ctrl-end for
> >   beginning/end of file.
> >
> > * Jump with ^J (or vi-like :). Enter absolute address or +12 or -40 for
> >   relative jumps.
> >
> > * Find with ^F (or vi-like /). No support for bytes, but useful for
> >   finding text. (^G or n for next match, ^D or p for previous match.)
> >
> > * Support all the usual suspects for "quit": vi-like q, desktop-like ^Q,
> >   panic ^C, or even plain old Esc.
>
> Sure.
>
> > * The ASCII pane is made more readable by (hopefully) reasonable use of
> >   color.
>
> When editing a primarily text file (which comes up a lot when I'm trying to
> figure out if something is a space or a tab, for example), the red J for
> newlines are REALLY ugly to me.
>
> The commodore 64 (yes I am that old) used to naturally display nonprintable
> characters in reverse video when they were quoted, which is what I
> implemented.
> It's quite likely I'm biased by that, but it makes each displayed
> character unique.
>
> (The red you've chosen is also _dark_, which makes it harder for me to
> see...)
>

that's unfortunate --- it sounds like we have diametrically opposed aims
with the ASCII panel. i specifically wanted non-ASCII to disappear into the
background. i think i use the ASCII pane either for reading string tables
and the like (where a really clear "these are the strings"/"these are the
bytes between" delineation is helpful), or for navigating, where it's kind
of a table of contents (and having the actual text [typically a filename in
some kind of archive file] stand out is again helpful).

(and for the special case of text files, i also thought the red Js clearly
demarcating lines was a nice feature :-( )


> > Regular control characters are shown in red using the
> >   appropriate letter (so a red A is 0x01, etc),
>
> Sigh. Not an aesthetic decision that appeals to me, but bikeshedding over
> color
> is not a clear win for anybody.
>

an alternative i had was to use the Unicode control pictures (U+2400 and
up) to show something like "␡ELF␂␁␁␀␀␀␀␀␀␀␀␀␃␀>␀␁␀␀" (that's the start of a
random ELF file) but unless you have really large fonts it's unreadable,
and it only covers the control characters, not the top bit set bytes.


> > printable characters are
> >   shown normally, and top-bit set characters are just shown as a purple
> >   question mark (since I couldn't come up with a better representation
> >   that had any obvious value
>
> The previous representation had a unique display for each character.
>

i know, but like i said, to me that's negative value --- it's very visually
distracting, and i don't know what i can _do_ with the extra information.
if i want to see non-text bytes, there's the hex view for that.

we could always have a --color, or use a key to toggle while it's running?
(if there are people who want _both_ styles at different times.)


> > --- in my experience top-bit set characters
> >   are either meaningless in ASCII, part of a UTF-8 sequence in modern
> >   files, or in some random code page in ancient files).
>
> It wasn't "correct", it was "unique".
>
> Those of us who've memorized the ascii table (toybox has an "ascii"
> command for
> a reason) could actually work out at a glance what each value is. (Well,
> once
> we've been staring at hex dumps for a bit, anyway. You can recognize the
> same
> 3-character sequence easily enough when finding multiple instances of
> it...)
>
> Zero through 31 was 64 through 95 shifted down and recolored, and 128
> through
> 255 were the bottom 128 entries shifted _up_ and recolored. The awkward
> ones
> were 127 and 255 (went with reversed space), and the low ascii part of the
> high
> address space (128-159) had been remapped _twice_ and thus had a collison
> needing a third color. I was never happy with those _not_ being reversed,
> because reversed meant "nonprintable", but I hadn't introduced colors yet
> and
> was out of shades of grey. :P
>
> I understand you want to use foreground color instead and reserve
> "reversed" for
> the cursor. I was reluctant to _require_ color because I dunno what
> variants of
> "colorblind" actually come up these days. (Printing things out on paper is
> still
> usually black and white but I dunno how relevant that is. Actual non-color
> displays aren't as big a deal as they used to be...)
>

yeah, it's also annoying that no terminals support the old sequences for
querying the foreground and background colors any more, so you can't even
auto-detect dark-on-light vs light-on-dark, let alone know what the actual
colors are.


> Still not entirely happy with it though. :(
>
> > The choice of
> >   red and purple was to deliberately make these not-actually-ASCII
> >   characters slide into the background; before this patch they have so
> >   many bright pixels (especially with the use of reverse video) that I
> >   couldn't clearly see the *actual* ASCII content in the ASCII pane.
>
> I wanted them to stand out when looking at a mostly ascii file.


ah, interesting. how about

diff --git a/toys/other/hexedit.c b/toys/other/hexedit.c
index 398ec15d..e6f94bc1 100644
--- a/toys/other/hexedit.c
+++ b/toys/other/hexedit.c
@@ -89,7 +89,7 @@ static void draw_char(int ch)
 {
   if (ch >= ' ' && ch < 0x7f) putchar(ch);
   else {
-    if (ch < ' ') printf("\e[31m%c", ch + '@');
+    if (ch < ' ') printf("\e[1;31m%c", ch + '@');
     else printf("\e[35m?");
   }
   printf("\e[0m");

is bold red bright enough for you? (it's not too bright to be distracting
for me.) what terminal and color scheme are you using? (though, like you
say, any time you have colors in a program you probably need to assume that
sooner or later you end up needing to implement the equivalent of
$LS_COLORS :-( )


> When dealing
> with a significantly binary file I was mostly off looking at the hex side
> of the
> force.
>
> (I grew up with petascii representation so it's a native tongue to me. I
> admit
> to being biased here. But your chosen representation... has its own
> issues.)
>
> > * Addresses are now shown in yellow. No real justification other than "it
> >   looks nice".
>
> Define "yellow". (It's brown here. Missing a "bright" escape maybe?)
>

sadly "yellow" is one of the most variable colors amongst terminals:
https://en.wikipedia.org/wiki/ANSI_escape_code#Colors

here's the whole thing emboldened, which is a bit "in your face" for me,
but fine if it's better for you (this is instead of, rather than on top of,
the previous patch):

diff --git a/toys/other/hexedit.c b/toys/other/hexedit.c
index 398ec15d..3ce6f824 100644
--- a/toys/other/hexedit.c
+++ b/toys/other/hexedit.c
@@ -87,6 +87,7 @@ static int prompt(char *prompt, char *initial_value)
 // Render all characters printable, using color to distinguish.
 static void draw_char(int ch)
 {
+  printf("\e[1m");
   if (ch >= ' ' && ch < 0x7f) putchar(ch);
   else {
     if (ch < ' ') printf("\e[31m%c", ch + '@');
@@ -109,8 +110,8 @@ static void draw_status(void)

 static void draw_byte(int byte)
 {
-  if (byte) printf("%02x", byte);
-  else printf("\e[2m00\e[0m");
+  if (byte) printf("\e[1m%02x\e[m", byte);
+  else printf("00");
 }

 static void draw_line(long long yy)
@@ -121,7 +122,7 @@ static void draw_line(long long yy)
   if (yy+xx>=TT.len) xx = TT.len-yy;

   if (yy<TT.len) {
-    printf("\r\e[33m%0*llx\e[0m ", TT.numlen, yy);
+    printf("\r\e[1;33m%0*llx\e[0m ", TT.numlen, yy);
     for (x=0; x<xx; x++) {
       putchar(' ');
       draw_byte(TT.data[yy+x]);


> > * NUL bytes in the hex pane are shown dimmed. I find this helpful
> >   especially when there's a lot of padding, and it can actually be a
> >   useful clue when reverse engineering (you can "see" repeated patterns
> >   more easily), but I can understand if this one's controversial.
>
> I don't think we're going to get a representation that satisfies everybody.
> Possibly there should be a command line flag or something?
>

or the ls-style environment variable?

(or, even more generally, i've been wondering whether toybox should have
some generic $TOYBOX_<toy>_FLAGS type of thing, rather than the ad hoc set
that we see with GNU grep and ls [but not most other things]. of course,
with my "hermetic build" hat on, i shiver at that thought.)


> > * Errors are shown "vim style" in bold white text on a red background,
> >   waiting briefly to ensure they're seen.
>
> A bit _too_ briefly for my tastes. Can we wait until they hit a key, maybe?
>

makes sense. will do. (i'm assuming we'll want something like this to share
with vi eventually.)


> > * The status bar shows the filename, whether the file is opened
> >   read-only, the current offset into the file, and the total
> >   length of the file.
>
> Way back when, I used the hexedit filename display line to test the utf8
> string
> measuring plumbing I was creating in lib/tty.c (before doing tar), I
> wonder if
> it's still getting that right...
>
>   $ mkdir sub5
>   $ echo hello world > sub5/"$(cat tests/files/utf8/arabic.txt)"
>   $ ./hexedit sub5/*
>
> Not... exactly. Could be worse, though.
>

yeah, another thing we'll want to share with vi.


> > * SIGWINCH handling has been added.
>
> A distinct improvement, yes. Thanks.
>
> Rob
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20210422/39d9149f/attachment.html>


More information about the Toybox mailing list