![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
use String::Approx qw(amatch asubstitute);
For example 1-difference means that a match is found if there is one character too many (insertion) or one character missing (deletion) or one character changed (substitution). Those are exclusive ors: that is, not one of each type of modification but exactly one.
index()
or normal regular expression matching.
use String::Approx qw(amatch);
amatch("PATTERN"); amatch("PATTERN", @LIST); amatch("PATTERN", [ @MODS ]); amatch("PATTERN", [ @MODS ], @LIST);
The PATTERN is a string, not a regular expression. The regular expression metanotation (. ? * + {...,...} ( ) | [ ] ^ $ \w ...
) will be understood as literal characters, that is, a *
means in regex terms \*
, not "match 0 or more times".
The LIST is the list of strings to match against the pattern. If no LIST is
given matches against $_
.
The MODS are the modifiers that tell how approximately to match. See below
for more detailed explanation.
NOTE: The syntax really is [ @MODS ]
, the square brackets [ ]
must be in there. See below for examples.
In scalar context amatch() returns the number of successful matches. In list context amatch() returns the strings that had matches.
Example:
use String::Approx qw(amatch);
open(WORDS, '/usr/dict/words') or die;
while (<WORDS>) { print if amatch('perl'); }
or the same ignoring case:
use String::Approx qw(amatch);
open(WORDS, '/usr/dict/words') or die;
while (<WORDS>) { print if amatch('perl', ['i']); }
use String::Approx qw(asubstitute);
asubstitute("PATTERN", "SUBSTITUTION"); asubstitute("PATTERN", "SUBSTITUTION", @LIST); asubstitute("PATTERN", "SUBSTITUTION", [ @MODS ]); asubstitute("PATTERN", "SUBSTITUTION", [ @MODS ], @LIST);
The PATTERN is a string, not a regular expression. The regular expression metanotation (. ? * + {...,...} ( ) | [ ] ^ $ \w ...
) will be understood as literal characters, that is, a *
means in regex terms \*
, not "match 0 or more times".
Also the SUBSTITUTION is a string, not a regular expression. Well, mostly. Most of the regular expression metanotation (.
, ?
,
*
, +
, ...) will be not understood as literal characters, that is, a *
means in regex terms \*
, not "match 0 or more times". The understood notations are
[ @MODS ]
, the square brackets [ ]
must be in there. See below for examples.
The LIST is the list of strings to substitute against the pattern. If no
LIST is given substitutes against $_
.
In scalar context asubstitute() returns the number of successful substitutions. In list context asubstitute() returns the strings that had substitutions.
Examples:
use String::Approx qw(asubstitute);
open(WORDS, '/usr/dict/words') or die; while (<WORDS>) { print if asubstitute('perl', '($&)'); }
or the same ignoring case:
use String::Approx qw(asubstitute);
open(WORDS, '/usr/dict/words') or die; while (<WORDS>) { print if asubstitute('perl', '($&)', [ 'i' ]); }
amatch()
and
asubstitute()
is a list of strings that control the matching
of PATTERN. The first two, i and
g, are the usual regular expression match/substitute modifiers, the rest are
special for approximate matching/substitution.
o m s x
are not supported
because their definitions for approximate matching are less than clear.
The relative number of differences is relative to the length of the
PATTERN, rounded up: if, for example, the PATTERN is 'bouillabaise'
and the MODS is ['20%']
the k becomes 3.
If you want to disable a particular kind of difference you need to explicitly set it to zero: for
example 'D0'
allows no deletions.
In case of conflicting definitions the later ones silently override, for example:
[2, 'I3', 'I1']
equals
['I1', 'D2', 'S2']
use String::Approx qw(amatch asubstitute);
open(WORDS, "/usr/dict/words") or die; while (<WORDS>) { # <--- }
and the following examples just replace the above '# <---
' line.
$_
print if amatch('perl');
The one difference is automatically the result in this case because first the rule of the 10 % of the length of the pattern ('perl
') is used and then the at least 1 rule.
print if amatch('perl', [ 'i' ]);
The case is ignored in matching (i).
print if amatch('perl', [ '0', 'I1' ]);
The one insertion is easiest achieved with first disabling any approximateness () and then enabling one insertion (
I1
).
print if amatch('perl', [ 'D0' ]);
The zero deletion is easily achieved with simply disabling any deletions (D0
), the other types of differences, the insertions and substitutions, are
still enabled.
print if amatch('perl', '<B>$&</B>', [ 'g' ]);
All (g) of the approximately matching parts of the input are surrounded by the HTML
emboldening markup.
use String::Approx qw(amatch asubstitute);
open(WORDS, "/usr/dict/words") or die; @words = <words>; # <---
and the examples still go where the '# <---
' line is.
@matched = amatch('perl', @words);
The @matched
contains the elements of the @words
that matched approximately.
@substituted = asubstitute('perl', '<EM>$&</EM>', [ 'g' ], @words);
The @substituted
contains with all (g) the substitutions
the elements of the @words
that matched approximately.
$_
and try to amatch() or
asubstitute(). Perhaps you are using the Perl option -e
but you did forget the Perl option -n
?
When matching long patterns, String::Approx
attempts to partition the match. In other words, it tries to do the
matching incrementally in smaller parts.
If this fails the above message is shown. Please try using shorter match patterns.
See below for LIMITATIONS for more detailed explanation why this happens.
The partitioning scheme explained above that is used for matching long patterns cannot, sadly enough, be used substituting.
Please try using shorter substitution patterns.
See below for LIMITATIONS for more detailed explanation why this happens.
For amatch() this can be avoided by partitioning the pattern, matching it in shorter subpatterns. This makes matching a bit slower and a bit more fuzzier, more approximate. For asubstitute() this partitioning cannot be done, the absolute maximum for the substitution pattern length is 19 but sometimes, for example it the approximateness is increased, even shorter patterns are too much. When this happens, you must use shorter patterns.
String::Approx
version 1 agrep is still faster. If you do not know what
agrep
is: it is a program like the UNIX grep but it knows, among other things,
how to do approximate matching. agrep
is still about 30 times faster than Perl + String::Approx
. NOTE: all these speeds were measured in one particular system using one
particular set of tests: your mileage will vary.
For long patterns, more than about 40, the first
String::Approx
v1.*
$_
and the old messy way of having an unlimited number of modifiers. The first
need won.
There is a backward compability mode, though, if you do not want to change your amatch() and asubstitute() calls. You have to change your use
line, however:
use String::Approx qw(amatch compat1);
That is, you must add the compat1
symbol if you want to be compatible with the String::Approx
version 1 call syntax.
<jhi@iki.fi>
<gnat@frii.com>
$CommentsMailTo = "perl5@dcs.ed.ac.uk"; include("../syssies_footer.inc");?>