Skip to content

Conversation

vasqu
Copy link
Contributor

@vasqu vasqu commented Jul 22, 2025

Continuation of #39228 for the VL models

Current inference script for testing:

from transformers import (
    AutoTokenizer,
    Ernie4_5_VLForConditionalGeneration,
    Ernie4_5_VLImageProcessor,
    Ernie4_5_VLProcessor,
    Ernie4_5_VLVideoProcessor,
)


# conversions happened locally based on the conversion script
model_path = "/raid/anton/code/forks/transformers/src/transformers/models/ernie4_5_vl/AntonV/ErnieVL"

model = Ernie4_5_VLForConditionalGeneration.from_pretrained(
    model_path,
    device_map="auto",
    dtype="auto",
)

conversation = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Only use English during your responses and describe the following image."},
            {"type": "image", "url": "https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example1.jpg"},
        ]
    },
]

# constructing on the fly, TODO: move to conversion
tokenizer = AutoTokenizer.from_pretrained(model_path)
processor = Ernie4_5_VLProcessor(
    image_processor=Ernie4_5_VLImageProcessor(),
    tokenizer=tokenizer,
    video_processor=Ernie4_5_VLVideoProcessor(),
    chat_template=tokenizer.chat_template,
)
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device)

generated_ids = model.generate(
    **inputs,
    max_new_tokens=64,
    do_sample=False,
    )
output_text = processor.decode(generated_ids[0][len(inputs['input_ids'][0]):])
print(output_text)

Output:
The image features a person sitting on a hilltop, gazing out at a vast mountain range. The person is wrapped in a colorful, striped blanket, and their head is covered with a red headscarf. The foreground includes vibrant pink flowers, adding a pop of color to the scene. The background show

Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, ernie4_5_moe, ernie4_5_vl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants