[Toybox] [PATCH] vi: Replace linelist with mem_block based design

Thu Jan 16 22:13:23 PST 2020

On Thu, Jan 16, 2020 at 9:59 AM Jarno Mäkipää <jmakip87 at gmail.com> wrote:
>
> hey
>
> On Thu, Jan 16, 2020 at 6:28 PM enh <enh at google.com> wrote:
> >
> > my 2c: i wouldn't do anything this complicated. every editor i've
> > written or worked on in the past used the much simpler scheme of one
> > buffer with a gap at the most recent insertion point
> > [https://en.wikipedia.org/wiki/Gap_buffer]. the most extra complexity
> > i've seen this cause is a separate array of line breaks, but since we
> > have the busybox vi style of not doing line wrapping, that seems
> > unnecessary?
>
> I actually found this approach much simpler to implement than atleast
> the dlist based one I previously had.

oh, yeah, i agree that the "list of lines" is the worst option.
especially if you want to scale to big files.

> Difficulties compared to linked
> list of lines, is scrolling down or counting line numbers. But
> inserting and cutting text is actually much easier. I think Gap_buffer
> shares same difficulties without benefit of loading file fast, or
> handling big files. I can load 8gb file with this patch in blink of
> eye.

that's true for gap buffer too. traditionally you just put the gap at
the end. (but if there's one thing computers are good at, it's copying
data around. and the gap amortizes that anyway, because you only move
the "after" part every "gap_length" bytes.)

> Reason why I started doing this was that I wanted to do undo, with
> infinite history. And could not figure out easy way with linelist. Now
> undo history should be really easy to make, by just having separate
> slice_list of cut/insert sections in order.

this is easy with the gap buffer too. the best trick i learned was
from Rob Pike's various editors (you can read the sam
[http://doc.cat-v.org/plan_9/4th_edition/papers/sam/] and acme
[http://doc.cat-v.org/plan_9/4th_edition/papers/acme/] papers on the
web): there's only one edit, and that's replace-range(start_offset,
end_offset, bytes). insert just has a start == end, delete has no
bytes, and replace is replace. and there's a trivial transformation
from each operation to its "undo". and you can put the undone edits on
the redo stack too if you want.

> > i do know that Visual Studio Code started with an array of lines and
> > switched to a piece table (common in word processors)
> > [https://en.wikipedia.org/wiki/Piece_table], but then you end up
> > needing a cache to make that fast, and...
> > https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation
>
> Actually I think this method i did is Piece table? I didint know what
> to call it.
>
> >
> > On Thu, Jan 16, 2020 at 8:17 AM Rob Landley <rob at landley.net> wrote:
> > >
> > > On 1/16/20 7:58 AM, Jarno Mäkipää wrote:
> > > > Replaced dlist linelist with continuous memory blocks. This will allow
> > > > editing huge files without billion mallocs.
> > >
> > > More or less what I was trying to implement back when vi was a thing I was going
> > > to work on.
> > >
> > > > File is first opened with
> > > > mmap() and mapped region is fully described in block_list as one block.
> > >
> > > When I tried this it turned into a complex mess of different objects with
> > > different allocation/lifetime rules that had to interact. I intended to cap the
> > > block size because splitting becomes painful otherwise (and then wound up having
> > > strings individually allocated with the array of pointers to the start of
> > > strings being a seperate allocation), and of course mmap() requiring the offset
> > > to be a multiple of page size combined badly with line starts being all over the
> > > place...
> > >
> > > (I never did figure out what to do about a file that changed out from under you
> > > while you were editing it. The MAP_SHARED version has the changes invalidate
> > > your indexes and segfault, especially if it's truncated out from under you. The
> > > MAP_PRIVATE version doesn't work on nommu, I believe still has the truncate
> > > problem, and doesn't help you with parts it hasn't faulted in until after the
> > > file changed.)

yeah, as far as i know there's no fix to that. the only 8GiB i can
imagine anyone loading into vi would be a log, and logs get truncated.

is editing 8GiB text files on machines with <8GiB a goal or a
non-goal? (i know hexedit supports this, but i think it seems more
likely there: editing a disk image on a machine, say, because most
devices of all scales have a lot more disk than ram. but a text
file...?)

> I had idea that I could just heap allocate and load file with read()
> as fallback or if file is small. Implementation does not care what
> type of memory original block is. Only unload on closing needs to call
> either free() or munmap()
>
> > >
> > > > Currently "valid" data is described as slices, when first loading file
> > > > there is only one slice that points to memory existing in block_list.
> > > > When cutting text, block_list is not freed or modified, but instead
> > > > slice_list is modified to have "hole" between 2 slices. when inserting
> > > > new mem_block is added, previos slices are cut in cursor position and
> > > > new slice is added...
> > >
> > > *shrug* I leave this to you.
> > >
> > > > Implementation is not as finished as old linelist, but most of the heavy
> > > > lifting is done so wanted to share this at this state.
> > > >
> > > > Added functions to handling data inside block_list+slice_list
> > > >
> > > > insert_str(), cut_str() are used for all delete and add operations
> > > > text_strrchr(), text_strchr() are used for searching lineendings
> > > > text_byte(), text_codepoint(), text_getline() are for simple data access
> > > >
> > > > Implemented: yank,delete,insert,push and most of the previos moves
> > > >
> > > > Implemented partly: saving of file, since current file is still mmaped
> > > > cant save on top of it,
> > >
> > > You never want to do that anyway because a crash halfway through (battery dies,
> > > etc) leaves you a corrupted file.
> > >
> > > > so now saves into "filename".swp, later need
> > > > to move .swp into original file after munmap
> > >
> > > rename() is atomic.
>
> Yeah i was thinking of saving temp file and then unloading mmap and
> original file (and rest of my allocated blocks) and then rename() and
> for extra cheese maybe file permissions and such need to be copied
> from original file...

i think vim does the opposite --- it renames the existing file, then
writes to the old file name, then unlinks the renamed original if
everything goes okay.

> > > > Not Implemented: search, linenumber counting, etc..
> > > >
> > > > Removed: Some unused functions
> > > >
> > > > br
> > > >
> > > > -Jarno
> > >
> > > Do I apply this or not?
> > >
> > > Rob
>
> Its your call, this still needs more work, but I think its better
> design than dlist one.
>
> -Jarno
>
> > > _______________________________________________
> > > Toybox mailing list
> > > Toybox at lists.landley.net
> > > http://lists.landley.net/listinfo.cgi/toybox-landley.net