Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

Algorithm::AhoCorasick
efficient search for multiple strings

Algorithm::AhoCorasick - efficient search for multiple strings


NAME

Algorithm::AhoCorasick - efficient search for multiple strings


VERSION

Version 0.01


SYNOPSIS


 use Algorithm::AhoCorasick qw(find_all);

 $found = find_all($text, @keywords);

 if (!$found) {

     print "no keywords found\n";

 } else {

     foreach $pos (sort keys %$found) {

         $keywords = join ', ', @{$found->{$pos}};

         print "$pos: $keywords\n";

     }

 }


DESCRIPTION

Aho-Corasick is a classic (1975) algorithm for locating elements of a finite set of strings within an input text. It constructs a finite state machine from a list of keywords, then uses the machine to locate all occurrences of the keywords. Construction of the machine takes time proportional to the sum of the lengths of the keywords and the machine processes the input string in a single pass - that is, the algorithm may be considerably more efficient than searching for each keyword separately.


PROCEDURAL INTERFACE

The module exports 2 functions for the common use cases: find_all for finding all matches, and find_first for finding whether a match exists at all. Note that both functions must be explicitly imported (i.e. with use Algorithm::AhoCorasick qw(find_all find_first);) before they can be called. Both functions take the same arguments: the first argument is the text to be searched, the following are the keywords to search for (there must be at least one, and the functions die rather than search for empty strings).

find_all

When no keyword is found in the input text, find_all returns undef; when some keywords are found, the return value is a hash reference mapping positions to keywords (in an array reference, ordered by length) found at those positions.

find_first

When no keyword is found in the input text, find_first returns undef in scalar context and an empty array in list context; when a keyword is found, the return value is a pair of its position in the input text and the found keyword (as a list if the function has been called in list context, as an array reference otherwise).


OBJECT-ORIENTED INTERFACE

find_all and find_first are just thin wrappers around the state machine class Algorithm::AhoCorasick::SearchMachine, which can also be used directly for a more customizable search scenarios (i.e. when the input text isn't available all at once) - see the Algorithm::AhoCorasick::SearchMachine POD for details.


AUTHOR

Vaclav Barta, <vbar@comp.cz>


BUGS

Please report any bugs or feature requests to bug-algorithm-ahocorasick at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.


COPYRIGHT & LICENSE

Copyright 2007 Vaclav Barta, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.


CREDITS

Adapted from implementation by Tomas Petricek, available at http://www.codeproject.com/cs/algorithms/ahocorasick.asp .

The algorithm is from Alfred V. Aho and Margaret J. Corasick, Efficient string matching: an aid to bibliographic search, CACM, 18(6):333-340, June 1975.

Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy