Skip to content

Commit 0ef385f

Browse files
authored
Update README.md
1 parent d595d11 commit 0ef385f

File tree

1 file changed

+11
-9
lines changed

1 file changed

+11
-9
lines changed

README.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,14 @@ The main characteristics of each implemented algorithm are presented below. The
2323
| | | Normalized? | Metric? | Type | Cost |
2424
|-------- |------- |------------- |---------- | ------ | ---- |
2525
| [Levenshtein](#levenshtein) |distance | No | Yes | | O(m.n) <sup>1</sup> |
26-
| [Normalized Levenshtein](#normalized-levenshtein) |distance<br>similarity | Yes | No | | O(m.n) <sup>1</sup> |
27-
| [Weighted Levenshtein](#weighted-levenshtein) |distance | No | No | | O(m.n) <sup>1</sup> |
28-
| [Damerau-Levenshtein](#damerau-levenshtein) <sup>3</sup> |distance | No | Yes | | O(m.n) <sup>1</sup> |
29-
| Optimal String Alignment <sup>3</sup> |not implemented yet | No | No | | O(m.n) <sup>1</sup> |
30-
| [Jaro-Winkler](#jaro-winkler) |similarity<br>distance | Yes | No | | O(m.n) |
31-
| [Longest Common Subsequence](#longest-common-subsequence) |distance | No | No | | O(m.n) <sup>1,2</sup> |
32-
| [Metric Longest Common Subsequence](#metric-longest-common-subsequence) |distance | Yes | Yes | | O(m.n) <sup>1,2</sup> |
33-
| [N-Gram](#n-gram) |distance | Yes | No | | O(m.n) |
26+
| [Normalized Levenshtein](#normalized-levenshtein) |distance<br>similarity | Yes | No | | O(m*n) <sup>1</sup> |
27+
| [Weighted Levenshtein](#weighted-levenshtein) |distance | No | No | | O(m*n) <sup>1</sup> |
28+
| [Damerau-Levenshtein](#damerau-levenshtein) <sup>3</sup> |distance | No | Yes | | O(m*n) <sup>1</sup> |
29+
| Optimal String Alignment <sup>3</sup> |not implemented yet | No | No | | O(m*n) <sup>1</sup> |
30+
| [Jaro-Winkler](#jaro-winkler) |similarity<br>distance | Yes | No | | O(m*n) |
31+
| [Longest Common Subsequence](#longest-common-subsequence) |distance | No | No | | O(m*n) <sup>1,2</sup> |
32+
| [Metric Longest Common Subsequence](#metric-longest-common-subsequence) |distance | Yes | Yes | | O(m*n) <sup>1,2</sup> |
33+
| [N-Gram](#n-gram) |distance | Yes | No | | O(m*n) |
3434
| [Q-Gram](#q-gram) |distance | No | No | Profile | O(m+n) |
3535
| [Cosine similarity](#cosine-similarity) |similarity<br>distance | Yes | No | Profile | O(m+n) |
3636
| [Jaccard index](#jaccard-index) |similarity<br>distance | Yes | Yes | Set | O(m+n) |
@@ -52,13 +52,15 @@ Although the topic might seem simple, a lot of different algorithms exist to mea
5252
- StringSimilarity : Implementing algorithms define a similarity between strings (0 means strings are completely different).
5353
- NormalizedStringSimilarity : Implementing algorithms define a similarity between 0.0 and 1.0, like Jaro-Winkler for example.
5454
- StringDistance : Implementing algorithms define a distance between strings (0 means strings are identical), like Levenshtein for example. The maximum distance value depends on the algorithm.
55-
- NormalizedStringDistance : This interface extends StringDistance. For implementing classes, the computed distance value is between 0.0 and 1.0. NormalizedLevenshtein is an example of NormalizedLevenshtein.
55+
- NormalizedStringDistance : This interface extends StringDistance. For implementing classes, the computed distance value is between 0.0 and 1.0. NormalizedLevenshtein is an example of NormalizedStringDistance.
5656

5757
Generally, algorithms that implement NormalizedStringSimilarity also implement NormalizedStringDistance, and similarity = 1 - distance. But there are a few exceptions, like N-Gram similarity and distance (Kondrak)...
5858

5959
### Metric distances
6060
The MetricStringDistance interface : A few of the distances are actually metric distances, which means that verify the triangle inequality d(x, y) <= d(x,z) + d(z,y). For example, Levenshtein is a metric distance, but NormalizedLevenshtein is not.
6161

62+
A lot of nearest-neighbor search algorithms and indexing structures rely on the triangle inequality. You can check "Similarity Search, The Metric Space Approach" by Zezula et al. for a survey. These cannot be used with non metric similarity measures.
63+
6264
[Read Javadoc for a detailed description](http://api123.web-d.be/api/java-string-similarity/head/index.html)
6365

6466
## Shingles (n-gram) based similarity and distance

0 commit comments

Comments
 (0)