Skip to content

Commit 5871657

Browse files
committed
fix(Tokenizer): bpe tokenizing
1 parent cb3167c commit 5871657

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

neuralnetlib/preprocessing.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -470,10 +470,10 @@ def learn_bpe(self, texts: list[str], cache: bool = True) -> dict:
470470
word = word.lower()
471471
chars = list(word)
472472
vocab[tuple(chars)] += 1
473-
473+
474474
word_pairs = self.get_pairs(chars)
475475
for pair in word_pairs:
476-
pairs[pair] += vocab[tuple(chars)]
476+
pairs[pair] += 1
477477

478478
merges = {}
479479
for i in range(self.bpe_merges):

0 commit comments

Comments
 (0)