-
Notifications
You must be signed in to change notification settings - Fork 281
Open
Labels
Description
Description
MyPy reports a bunch of typing issues in pythainlp/benchmarks/word_tokenization.py
Expected results
- All functions have explicit type hinting information
- No typing incompatible issues
Current results
ref_sample
in these two lines for examples, are seen as str
and should not have shape
attribute.
pythainlp/pythainlp/benchmarks/word_tokenization.py
Lines 164 to 165 in 9a1274b
c_pos_pred = c_pos_pred[c_pos_pred < ref_sample.shape[0]] | |
c_neg_pred = c_neg_pred[c_neg_pred < ref_sample.shape[0]] |
But it looks like from _binary_representation
function, it may has a type of ND array.
However, the _binary_representation
type hints and docstring said they are str
:
pythainlp/pythainlp/benchmarks/word_tokenization.py
Lines 208 to 221 in 9a1274b
def _binary_representation(txt: str, verbose: bool = False): | |
""" | |
Transform text into {0, 1} sequence. | |
where (1) indicates that the corresponding character is the beginning of | |
a word. For example, ผม|ไม่|ชอบ|กิน|ผัก -> 10100... | |
:param str txt: input text that we want to transform | |
:param bool verbose: for debugging purposes | |
:return: {0, 1} sequence | |
:rtype: str | |
""" | |
chars = np.array(list(txt)) |
So there're confusions here to be fixed.
Steps to reproduce
Use MyPy to check the code
PyThaiNLP version
5
Python version
any
Operating system and version
any
More info
No response
Possible solution
No response
Files
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
No status