Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Commit 2951d57

Browse files
author
Matt Jones
committed
Fix SubworkTextEncoder binary search for small vocab sizes
1 parent 2f03f0c commit 2951d57

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

tensor2tensor/data_generators/text_encoder.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,7 @@ def build_to_target_size(cls,
285285
subtokenizer.build_from_token_counts(token_counts, store_filename,
286286
present_count, num_iterations)
287287

288-
if min_val == max_val or subtokenizer.vocab_size == target_size:
288+
if min_val >= max_val or subtokenizer.vocab_size == target_size:
289289
return subtokenizer
290290
elif subtokenizer.vocab_size > target_size:
291291
other_subtokenizer = cls.build_to_target_size(

0 commit comments

Comments
 (0)