You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(tokenization): add encode_message to tokenize messages one by one (#39507)
* feat(tokenization): add encode_message to tokenize messages one by one
* Fix the `encode_message` method, remove the `add_generation_prompt` parameter and add the corresponding error handling. Update the document to reflect this change and verify the error handling in the test.
* Optimize the `encode_message` method, improve the processing logic of the empty dialogue history, and ensure that the chat template can be applied correctly when the dialogue history is empty. Update the document to reflect these changes.
* The `_encode_message` method is deleted, the message coding logic is simplified, and the functional integrity of the `encode_message` method is ensured. Update the document to reflect these changes.
* Docs fix
* Revert changes in docstring of pad()
* Revert changes in docstring
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Arthur <[email protected]>
* Repair the call of the `encode_message` method, update it to `encode_message_with_chat_template` to support the chat template, and adjust the relevant test cases to reflect this change.
* Optimize the call format of the `apply_chat_template` method, and merge multi-line calls into a single line to improve code readability.
---------
Co-authored-by: pco111 <[email protected]>
Co-authored-by: Arthur <[email protected]>
0 commit comments