[Toybox] New toy: grep
Rob Landley
rob at landley.net
Mon Feb 27 20:45:36 PST 2012
On 02/27/2012 02:35 PM, Andre Renaud wrote:
> Hi Rob,
>
> On 28/02/12 07:01, Rob Landley wrote:
>> On 02/26/2012 10:01 PM, Andre Renaud wrote:
>>> Attached is a simplistic starter implementation of the grep command.
> [snip]
>> It might be best if I take over this one since I'm doing sed anyway
>> (which is fiddly and evil, and requires me to do the xregcmp()
>> infrastructure in lib/ and the compile time test for it).
>>
>> (I also note that I wrote my own regex/glob engine back in the 90's
>> under OS/2, so if bionic turns out not to have usable regexes and I have
>> to write my own, I'm prepared to do that...)
>
> That's fair enough - I'm mostly just playing with these in my spare
> time, so I'm not to precious about whether they're accepted or not.
>
> Regarding the greps over binaries, and arbitrary length buffers. I'm
> curious what kind of implementation you'd do there to avoid having
> issues with expressions that sit on block boundaries,
In lib/lib.c get_line() mallocs a buffer to hold the return value, so
there aren't any block bounaries. (It churns the heap a bit, but
"correct" beats "fast" here, and making malloc/free loops mostly reuse
the same chunk of memory (so it stays cache hot) is the simplest
optimization a malloc() implementation can do.)
> or regular
> expressions that have a possible infinite length match, such as 'a.*b'.
The _input_ isn't infinite (it's unbounded, but any real computer will
have finite storage), and grep is line-based so it only deals with one
lien at a time. Sure, that line could theoretically be a gigabyte, but
if you have the memory for it...
> Is it realistic to expect the entire file to be in memory for such a
> regexp?
Nope, just one line at a time. But they could grep a file with a
megabyte-long line. (We can't guarantee their system is powerful enough
for what they try to do. Our job is to give them enough rope.)
> I suppose mmap could be used, but isn't that a bit heavy-handed?
It wouldn't work for pipelines ("cat | grep" or similar). If we have to
have a code path for pipelines, might as well just use that for
everything...
> Regards,
> Andre
Rob
1330404336.0
More information about the Toybox
mailing list