Developers

Join

Please contact the project administrators of this project at the project summary page.

Sources

Sources are in the Mercurial repository.

Project Information

This is the cl-string-match project ("clstringmatch")

Common Lisp provides a SEARCH function to lookup given keys in a string, but its implementation can be subotimal: it can be beaten in some cases by even the simpliest brute-force algorithm. This project is aimed at implementing robust substring search (subsequence search) algorithms for Common Lisp.

Implemented algorithms

  • Brute-force (also known as naïve algorithm)
  • Boyer-Moore (with mismatched character heuristic)
  • Boyer-Moore-Horspool
  • Rabin-Karp
  • Knuth-Morris-Pratt
  • Aho-Corasick (with finite set of patterns)
  • Ukkonen's suffix tree construction (in linear time)

Along with subsequence search (matching) algorithms this package provides implementations of the following data structures and utility algorithms:

  • Trie (used by the Aho-Corasick algorithm)
  • Horner hash (used by the Rabin-Karp algorithm)

Check also the API reference.

How to use

At the moment Cl-String-Match is not supported by Quicklisp. Installation is manual.

Cl-String-Match exports functions in cl-string-match package (nicknamed sm).

Shortcuts look for given pattern pat in text txt. They are usually much slower but are easier to use:

  • string-contains-brute pat txt Brute-force
  • string-contains-bm pat txt Boyer-Moore
  • string-contains-rk pat txt Rabin-Karp
  • string-contains-kmp pat txt Knuth-Morris-Pratt
  • string-contains-ac pat txt Aho-Corasick

A more robust approach is to use pre-calculated index data that is processed by a pair of initialize and search functions:

  • initialize-bm pat and search-bm bm txt
  • initialize-rk pat and search-rk rk txt
  • initialize-kmp pat and search-kmp kmp txt
  • initialize-ac pat and search-ac ac txt. Here initialize-ac can accept a list of patterns that are compiled into a trie.

Following example looks for a given substring pat in a given line of text txt using Rabin-Karp algorithm implementation:

(let ((idx (initialize-rk pat)))
  (search-rk idx txt))
	 

Status

This project is still a work in progress. Any contributions are welcome!