[Toybox] awk seen in the wild
Andy Chu
andychup at gmail.com
Thu Jul 21 01:28:12 PDT 2016
> Kernighan Awk has its own regex implementation "b.c" in 958 lines, and
> there is an argument to keep it. It uses the Thompson linear-time
> NFA/DFA algorithm rather than exponential backtracking. See the note
> here:
Never mind about this tangent... I *think* GNU libc actually uses the
linear time algorithm, with a possible exception for backreferences.
I was in the middle of some research on that but didn't finish (musl
libc uses a fork of the TRE regex engine, etc.).
But oddly, GNU grep, awk, sed, and coreutils all have a copy of the
GNU libc regex engine? That is just annoying.
$ wc -l gawk-*/reg* */lib/reg*.[ch] | sort -n
81 coreutils-8.22/lib/regex.c
81 grep-2.24/lib/regex.c
81 sed-4.2.2/lib/regex.c
85 gawk-4.1.3/regex.c
591 gawk-4.1.3/regex.h
664 grep-2.24/lib/regex.h
667 coreutils-8.22/lib/regex.h
668 sed-4.2.2/lib/regex.h
834 gawk-4.1.3/regex_internal.h
868 sed-4.2.2/lib/regex_internal.h
910 coreutils-8.22/lib/regex_internal.h
912 grep-2.24/lib/regex_internal.h
1742 grep-2.24/lib/regex_internal.c
1744 sed-4.2.2/lib/regex_internal.c
1746 coreutils-8.22/lib/regex_internal.c
1759 gawk-4.1.3/regex_internal.c
3927 coreutils-8.22/lib/regcomp.c
3941 sed-4.2.2/lib/regcomp.c
3958 gawk-4.1.3/regcomp.c
3962 grep-2.24/lib/regcomp.c
4391 gawk-4.1.3/regexec.c
4412 grep-2.24/lib/regexec.c
4418 coreutils-8.22/lib/regexec.c
4421 sed-4.2.2/lib/regexec.c
Andy
More information about the Toybox
mailing list