[Toybox] [PATCH] vi: fix segfault

Rob Landley rob at landley.net
Fri Mar 15 10:10:19 PDT 2019


On 3/15/19 7:06 AM, Jacob Wahlgren wrote:
> Hi, I tried starting up vi to see how far along it was, and got an immediate segfault. Attached a patch (hopefully in the correct format).

Ah, I misunderstood. Not a patch to vi, a patch to lib.

Linestack is also infrastructure that I was working on for vi, and I mentioned
recently I'm thinking of removing it and starting over because nothing's using
it and infrastructure in search of a user is bad.

There's a design tension between "I have consecutive block of ram I'd like to
treat as discrete items" and "I want to insert/remove stuff". This not only
shows up with vi (mmap a file but treat it as lines) but also with command
shells (mmap a script). Having to make individual line copies is wasteful (copy
to/from disk cache to process heap, plus memory fragmentation on nommu systems),
but _not_ doing it is a nightmare to track object lifetimes. I keep meaning to
sit down and design a structure that handles both cases cleanly.

Meanwhile, I've been distracted writing an rbtree implementation for lib/, which
came up again with the tar.c clean up (function seen_inode() using a linked list
is just painful, but I also need it for gene2fs and any vfat/squashfs/isofs
equivalents I wind up doing, and probably zip...)

Another problem with vi is I never worked out how much scalability I want. I
have gigabyte-sized mbox files which I occasionally like to open with vi because
I find it easier to navigate than less, and I was worried that the array of
pointers to start of line at 8 bytes per line might scale badly with my standard
"remalloc every 64 entries added" approach, but power of two growth could also
get unfortunate there...

But the bigger issue with that is inserting via memmove() of the pointer array.
My current "wc qemu-devel" is ~15 million lines (so 256 megs of contiguous
memory, that's entirely doable on a modern system with mmmu), but if I insert a
line near the start of the file it has to memmove() 128 megs, so if I hit enter
four times it's gotta churn through copying half a gig of non-cached ram.
(That's _probably_ ok? But those sorts of issues are why I was going slowly.)

And that's if I have a pointer to NULL terminated lines of data, which brings up
the "embedded NUL" issue and to deal with _that_ I'd need line lengths so now
instead of an array of pointers it's an array of structures, and if I'm doing
that anyway I should just mmap and have the pointer be into the mmap and avoid
the copying...)

I could instead do a linked list, which means jumping to line ":1234567" becomes
the expensive part, but 16 million independent linked list allocations adds
about ~8 bytes overhead to each one because the _heap_ is a doubly linked list
and then there's the occasional pathological cache behavior of aligned cache
entries evicting each other that can slow down linked list traversal by a couple
orders of magnitude traversing the list, linux-kernel discussed that a few years
ago and I keep meaning to look up and make sure I'm _not_ doing the stupid thing
whatever that was...

Ahem. Easy for me to go down ratholes in some of these areas.

Rob

P.S. And I need to redo the crunch_string() logic now that I know that Unicode
is stupid. (Combining characters come _after_ the non-combining character they
collapse to, so you never know you're done printing a character until you read
the next character you _can't_ print. Bravo. I let UTF8 being sane lull me into
a false sense of security about what it was used to encode, I should have known
better. They call them "code points" with a straight face, that should have been
a clue I was dealing with insane bureaucrats, not engineers...)

P.P.S There's an escape sequence to tell the screen _not_ to wrap when you hit
the right edge. For top and such I was just very careful about not writing too
much, but if I use that sequence (I modified clear.c to undo it, look at "man 4
console_codes" and search for "autowrap") does it help? I don't _think_ so and
I'd need to make sure that the sigatexit() reset put it back, but I already need
to do that for "hide cursor" and friends, and yes "top | cat" _can't_ get that
right because you can't output ansi escape sequences if ctrl-c already caused
"cat" to exit, unless I want to have the signal handler open "/dev/tty" itself
which is on the todo list to think about...



More information about the Toybox mailing list