You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Gensim is billed as a Natural Language Processing package that does ‘Topic Modeling for Humans’. But its practically much more than that.
317
+
318
+
If you are unfamiliar with topic modeling, it is a technique to extract the underlying topics from large volumes of text. Gensim provides algorithms like LDA and LSI (which we will see later in this post) and the necessary sophistication to build high-quality topic models.
319
+
320
+
You may argue that topic models and word embedding are available in other packages like scikit, R etc. But the width and scope of facilities to build and evaluate topic models are unparalleled in gensim, plus many more convenient facilities for text processing.
321
+
322
+
It is a great package for processing texts, working with word vector models (such as Word2Vec, FastText etc) and for building topic models.
323
+
324
+
Also, another significant advantage with gensim is: it lets you handle large text files without having to load the entire file in memory
325
+
326
+
Gensim Tutorial – A Complete Beginners Guide: https://www.machinelearningplus.com/nlp/gensim-tutorial/
327
+
315
328
## Shingle (n-gram) based algorithms
316
329
A few algorithms work by converting strings into sets of n-grams (sequences of n characters, also sometimes called k-shingles). The similarity or distance between the strings is then the similarity or distance between the sets.
317
330
@@ -367,19 +380,6 @@ Similar to Jaccard index, but this time the similarity is computed as 2 * |V1 in
367
380
368
381
Distance is computed as 1 - similarity.
369
382
370
-
### Gensim
371
-
Gensim is billed as a Natural Language Processing package that does ‘Topic Modeling for Humans’. But its practically much more than that.
372
-
373
-
If you are unfamiliar with topic modeling, it is a technique to extract the underlying topics from large volumes of text. Gensim provides algorithms like LDA and LSI (which we will see later in this post) and the necessary sophistication to build high-quality topic models.
374
-
375
-
You may argue that topic models and word embedding are available in other packages like scikit, R etc. But the width and scope of facilities to build and evaluate topic models are unparalleled in gensim, plus many more convenient facilities for text processing.
376
-
377
-
It is a great package for processing texts, working with word vector models (such as Word2Vec, FastText etc) and for building topic models.
378
-
379
-
Also, another significant advantage with gensim is: it lets you handle large text files without having to load the entire file in memory
380
-
381
-
Gensim Tutorial – A Complete Beginners Guide: https://www.machinelearningplus.com/nlp/gensim-tutorial/
0 commit comments