Skip to content

Commit 8d977fb

Browse files
committed
Added Sift4 to README
1 parent 8363d0e commit 8d977fb

File tree

1 file changed

+26
-0
lines changed

1 file changed

+26
-0
lines changed

README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ A library implementing different string similarity and distance measures. A doze
2121
* [Cosine similarity](#shingle-n-gram-based-algorithms)
2222
* [Jaccard index](#shingle-n-gram-based-algorithms)
2323
* [Sorensen-Dice coefficient](#shingle-n-gram-based-algorithms)
24+
* [Experimental](#experimental)
25+
* [SIFT4](#sift4)
2426
* [Users](#users)
2527

2628

@@ -442,6 +444,30 @@ Similar to Jaccard index, but this time the similarity is computed as 2 * |V1 in
442444

443445
Distance is computed as 1 - cosine similarity.
444446

447+
## Experimental
448+
449+
### SIFT4
450+
SIFT4 is a general purpose string distance algorithm inspired by JaroWinkler and Longest Common Subsequence. It was developped to produce a distance measure that matches as close as possible to the human perception of string distance. Hence it takes into account elements like character substitution, character distance, longest common subsequence etc. It was developped using experimental testing, and without theoretical background.
451+
452+
```
453+
import info.debatty.java.stringsimilarity.experimental.Sift4;
454+
455+
public class MyApp {
456+
457+
public static void main(String[] args) {
458+
String s1 = "This is the first string";
459+
String s2 = "And this is another string";
460+
Sift4 sift4 = new Sift4();
461+
sift4.setMaxOffset(5);
462+
double expResult = 11.0;
463+
double result = sift4.distance(s1, s2);
464+
assertEquals(expResult, result, 0.0);
465+
}
466+
}
467+
```
468+
469+
470+
445471
## Users
446472
* [StringSimilarity.NET](https://github.com/feature23/StringSimilarity.NET) a .NET port of java-string-similarity
447473

0 commit comments

Comments
 (0)