[Toybox] [PATCH] vi: Replace linelist with mem_block based design

Jarno Mäkipää jmakip87 at gmail.com
Thu Jan 16 09:59:11 PST 2020


hey

On Thu, Jan 16, 2020 at 6:28 PM enh <enh at google.com> wrote:
>
> my 2c: i wouldn't do anything this complicated. every editor i've
> written or worked on in the past used the much simpler scheme of one
> buffer with a gap at the most recent insertion point
> [https://en.wikipedia.org/wiki/Gap_buffer]. the most extra complexity
> i've seen this cause is a separate array of line breaks, but since we
> have the busybox vi style of not doing line wrapping, that seems
> unnecessary?

I actually found this approach much simpler to implement than atleast
the dlist based one I previously had. Difficulties compared to linked
list of lines, is scrolling down or counting line numbers. But
inserting and cutting text is actually much easier. I think Gap_buffer
shares same difficulties without benefit of loading file fast, or
handling big files. I can load 8gb file with this patch in blink of
eye.

Reason why I started doing this was that I wanted to do undo, with
infinite history. And could not figure out easy way with linelist. Now
undo history should be really easy to make, by just having separate
slice_list of cut/insert sections in order.

>
> i do know that Visual Studio Code started with an array of lines and
> switched to a piece table (common in word processors)
> [https://en.wikipedia.org/wiki/Piece_table], but then you end up
> needing a cache to make that fast, and...
> https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation

Actually I think this method i did is Piece table? I didint know what
to call it.

>
> On Thu, Jan 16, 2020 at 8:17 AM Rob Landley <rob at landley.net> wrote:
> >
> > On 1/16/20 7:58 AM, Jarno Mäkipää wrote:
> > > Replaced dlist linelist with continuous memory blocks. This will allow
> > > editing huge files without billion mallocs.
> >
> > More or less what I was trying to implement back when vi was a thing I was going
> > to work on.
> >
> > > File is first opened with
> > > mmap() and mapped region is fully described in block_list as one block.
> >
> > When I tried this it turned into a complex mess of different objects with
> > different allocation/lifetime rules that had to interact. I intended to cap the
> > block size because splitting becomes painful otherwise (and then wound up having
> > strings individually allocated with the array of pointers to the start of
> > strings being a seperate allocation), and of course mmap() requiring the offset
> > to be a multiple of page size combined badly with line starts being all over the
> > place...
> >
> > (I never did figure out what to do about a file that changed out from under you
> > while you were editing it. The MAP_SHARED version has the changes invalidate
> > your indexes and segfault, especially if it's truncated out from under you. The
> > MAP_PRIVATE version doesn't work on nommu, I believe still has the truncate
> > problem, and doesn't help you with parts it hasn't faulted in until after the
> > file changed.)

I had idea that I could just heap allocate and load file with read()
as fallback or if file is small. Implementation does not care what
type of memory original block is. Only unload on closing needs to call
either free() or munmap()

> >
> > > Currently "valid" data is described as slices, when first loading file
> > > there is only one slice that points to memory existing in block_list.
> > > When cutting text, block_list is not freed or modified, but instead
> > > slice_list is modified to have "hole" between 2 slices. when inserting
> > > new mem_block is added, previos slices are cut in cursor position and
> > > new slice is added...
> >
> > *shrug* I leave this to you.
> >
> > > Implementation is not as finished as old linelist, but most of the heavy
> > > lifting is done so wanted to share this at this state.
> > >
> > > Added functions to handling data inside block_list+slice_list
> > >
> > > insert_str(), cut_str() are used for all delete and add operations
> > > text_strrchr(), text_strchr() are used for searching lineendings
> > > text_byte(), text_codepoint(), text_getline() are for simple data access
> > >
> > > Implemented: yank,delete,insert,push and most of the previos moves
> > >
> > > Implemented partly: saving of file, since current file is still mmaped
> > > cant save on top of it,
> >
> > You never want to do that anyway because a crash halfway through (battery dies,
> > etc) leaves you a corrupted file.
> >
> > > so now saves into "filename".swp, later need
> > > to move .swp into original file after munmap
> >
> > rename() is atomic.

Yeah i was thinking of saving temp file and then unloading mmap and
original file (and rest of my allocated blocks) and then rename() and
for extra cheese maybe file permissions and such need to be copied
from original file...

> >
> > > Not Implemented: search, linenumber counting, etc..
> > >
> > > Removed: Some unused functions
> > >
> > > br
> > >
> > > -Jarno
> >
> > Do I apply this or not?
> >
> > Rob

Its your call, this still needs more work, but I think its better
design than dlist one.

-Jarno

> > _______________________________________________
> > Toybox mailing list
> > Toybox at lists.landley.net
> > http://lists.landley.net/listinfo.cgi/toybox-landley.net



More information about the Toybox mailing list