Incompatible types in benchmarks.word_tokenization

### Description

MyPy reports a bunch of typing issues in [pythainlp/benchmarks/word_tokenization.py](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/benchmarks/word_tokenization.py)

### Expected results

- All functions have explicit type hinting information
- No typing incompatible issues

### Current results

`ref_sample` in these two lines for examples, are seen as `str` and should not have `shape` attribute.
https://github.com/PyThaiNLP/pythainlp/blob/9a1274bf692b8f2aa87e8e21ac646e7c625a998b/pythainlp/benchmarks/word_tokenization.py#L164-L165

But it looks like from `_binary_representation` function, it may has a type of ND array.

However, the `_binary_representation` type hints and docstring said they are `str`:

https://github.com/PyThaiNLP/pythainlp/blob/9a1274bf692b8f2aa87e8e21ac646e7c625a998b/pythainlp/benchmarks/word_tokenization.py#L208-L221

So there're confusions here to be fixed.

### Steps to reproduce

Use MyPy to check the code

### PyThaiNLP version

5

### Python version

any

### Operating system and version

any

### More info

_No response_

### Possible solution

_No response_

### Files

_No response_

	def _binary_representation(txt: str, verbose: bool = False):
	"""
	Transform text into {0, 1} sequence.

	where (1) indicates that the corresponding character is the beginning of
	a word. For example, ผม\|ไม่\|ชอบ\|กิน\|ผัก -> 10100...

	:param str txt: input text that we want to transform
	:param bool verbose: for debugging purposes

	:return: {0, 1} sequence
	:rtype: str
	"""
	chars = np.array(list(txt))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incompatible types in benchmarks.word_tokenization #1030

Description

Expected results

Current results

Steps to reproduce

PyThaiNLP version

Python version

Operating system and version

More info

Possible solution

Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	c_pos_pred = c_pos_pred[c_pos_pred < ref_sample.shape[0]]
	c_neg_pred = c_neg_pred[c_neg_pred < ref_sample.shape[0]]

Incompatible types in benchmarks.word_tokenization #1030

Description

Description

Expected results

Current results

Steps to reproduce

PyThaiNLP version

Python version

Operating system and version

More info

Possible solution

Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions