Skip to content

Commit b7cec62

Browse files
committed
Rename package
1 parent 1512c33 commit b7cec62

31 files changed

+13
-21
lines changed

README.md

Lines changed: 13 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,7 @@ A library implementing different string similarity and distance measures. A doze
3232
From pypi:
3333

3434
```bash
35-
pip install strsim
36-
```
37-
38-
or clone this repository:
39-
40-
```bash
41-
git clone https://github.com/luozhouyang/python-string-similarity
42-
cd python-string-similarity
43-
pip install -r requirements.txt
35+
pip install strsimpy
4436
```
4537

4638
## Overview
@@ -103,7 +95,7 @@ The Levenshtein distance between two words is the minimum number of single-chara
10395
It is a metric string distance. This implementation uses dynamic programming (Wagner–Fischer algorithm), with only 2 rows of data. The space requirement is thus O(m) and the algorithm runs in O(m.n).
10496

10597
```python
106-
from strsim.levenshtein import Levenshtein
98+
from strsimpy.levenshtein import Levenshtein
10799

108100
levenshtein = Levenshtein()
109101
print(levenshtein.distance('My string', 'My $string'))
@@ -119,7 +111,7 @@ This distance is computed as levenshtein distance divided by the length of the l
119111
The similarity is computed as 1 - normalized distance.
120112

121113
```python
122-
from strsim.normalized_levenshtein import NormalizedLevenshtein
114+
from strsimpy.normalized_levenshtein import NormalizedLevenshtein
123115

124116
normalized_levenshtein = NormalizedLevenshtein()
125117
print(normalized_levenshtein.distance('My string', 'My $string'))
@@ -140,8 +132,8 @@ This algorithm is usually used for optical character recognition (OCR) applicati
140132
It can also be used for keyboard typing auto-correction. Here the cost of substituting E and R is lower for example because these are located next to each other on an AZERTY or QWERTY keyboard. Hence the probability that the user mistyped the characters is higher.
141133

142134
```python
143-
from strsim.weighted_levenshtein import WeightedLevenshtein
144-
from strsim.weighted_levenshtein import CharacterSubstitutionInterface
135+
from strsimpy.weighted_levenshtein import WeightedLevenshtein
136+
from strsimpy.weighted_levenshtein import CharacterSubstitutionInterface
145137

146138
class CharacterSubstitution(CharacterSubstitutionInterface):
147139
def cost(self, c0, c1):
@@ -162,7 +154,7 @@ It does respect triangle inequality, and is thus a metric distance.
162154
This is not to be confused with the optimal string alignment distance, which is an extension where no substring can be edited more than once.
163155

164156
```python
165-
from strsim.damerau import Damerau
157+
from strsimpy.damerau import Damerau
166158

167159
damerau = Damerau()
168160
print(damerau.distance('ABCDEF', 'ABDCEF'))
@@ -192,7 +184,7 @@ The difference from the algorithm for Levenshtein distance is the addition of on
192184
Note that for the optimal string alignment distance, the triangle inequality does not hold and so it is not a true metric.
193185

194186
```python
195-
from strsim.optimal_string_alignment import OptimalStringAlignment
187+
from strsimpy.optimal_string_alignment import OptimalStringAlignment
196188

197189
optimal_string_alignment = OptimalStringAlignment()
198190
print(optimal_string_alignment.distance('CA', 'ABC'))
@@ -214,7 +206,7 @@ It is (roughly) a variation of Damerau-Levenshtein, where the substitution of 2
214206
The distance is computed as 1 - Jaro-Winkler similarity.
215207

216208
```python
217-
from strsim.jaro_winkler import JaroWinkler
209+
from strsimpy.jaro_winkler import JaroWinkler
218210

219211
jarowinkler = JaroWinkler()
220212
print(jarowinkler.similarity('My string', 'My tsring'))
@@ -246,7 +238,7 @@ This class implements the dynamic programming approach, which has a space requir
246238
In "Length of Maximal Common Subsequences", K.S. Larsen proposed an algorithm that computes the length of LCS in time O(log(m).log(n)). But the algorithm has a memory requirement O(m.n²) and was thus not implemented here.
247239

248240
```python
249-
from strsim.longest_common_subsequence import LongestCommonSubsequence
241+
from strsimpy.longest_common_subsequence import LongestCommonSubsequence
250242

251243
lcs = LongestCommonSubsequence()
252244
# Will produce 4.0
@@ -263,7 +255,7 @@ http://heim.ifi.uio.no/~danielry/StringMetric.pdf
263255
The distance is computed as 1 - |LCS(s1, s2)| / max(|s1|, |s2|)
264256

265257
```python
266-
from strsim.metric_lcs import MetricLCS
258+
from strsimpy.metric_lcs import MetricLCS
267259

268260
metric_lcs = MetricLCS()
269261
s1 = 'ABCDEFG'
@@ -300,7 +292,7 @@ The algorithm uses affixing with special character '\n' to increase the weight o
300292
In the paper, Kondrak also defines a similarity measure, which is not implemented (yet).
301293

302294
```python
303-
from strsim.ngram import NGram
295+
from strsimpy.ngram import NGram
304296

305297
twogram = NGram(2)
306298
print(twogram.distance('ABCD', 'ABTUIO'))
@@ -320,7 +312,7 @@ The cost for computing these similarities and distances is mainly domnitated by
320312
Directly compute the distance between strings:
321313

322314
```python
323-
from strsim.qgram import QGram
315+
from strsimpy.qgram import QGram
324316

325317
qgram = QGram(2)
326318
print(qgram.distance('ABCD', 'ABCE'))
@@ -330,7 +322,7 @@ print(qgram.distance('ABCD', 'ABCE'))
330322
Or, for large datasets, pre-compute the profile of all strings. The similarity can then be computed between profiles:
331323

332324
```python
333-
from strsim.cosine import Cosine
325+
from strsimpy.cosine import Cosine
334326

335327
cosine = Cosine(2)
336328
s0 = 'My first string'
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)