Skip to content

Conversation

altxtech
Copy link
Contributor

@altxtech altxtech commented Sep 11, 2025

What this does

Adds video file support to RubyLLM.

Supersedes #260, originally authored by @arnodirlam. Thank you for the groundwork.

Maintainers: happy to close this if you prefer waiting for the original PR.

What changed vs #260

  • Rebased on current main
  • Resolved conflicts in README/docs and Gemini capabilities
  • Addressed PR comment reviews
  • Included gemini-2.5-flash as a video model for tests

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Performance improvement

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
    • For provider changes: Re-recorded VCR cassettes with bundle exec rake vcr:record[provider_name]
    • All tests pass: bundle exec rspec
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

  • Breaking change
  • New public methods/classes
  • Changed method signatures
  • No API changes

Related issues

Closes #259

@arnodirlam arnodirlam mentioned this pull request Sep 12, 2025
17 tasks
@altxtech altxtech marked this pull request as ready for review September 12, 2025 11:08
response = chat.ask('What do you see in this video?', with: { video: video_path })

expect(response.content).to be_present
expect(response.content).not_to include('RubyLLM::Content')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an expectation that it recognizes the actual content of the video, like at least includes the words "woman" and "beach"?

Copy link
Contributor Author

@altxtech altxtech Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, 2 reasons:

  1. A bit out of scope. No other tests of this spec do this. If we were to make that change, it would better to do it for all tests for consistency, probably on a separate PR

  2. My understanding is these are more of a boundary interface test. In other words, we are testing if the lib correctly interacts with the providers (sends valid requests), and obtains responses. We are not testing the capability of the models themselves.

But I'll wait on more comments. If more people thinks it makes sense, I can add the assertions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about it not existing in this spec file. I'm a bit surprised about this though as it does exist in the spec for the text models.

expect(response.content).to include('4')

expect(first.content).to include('Matz')

expect(followup.content).to include('199')

I do like how this makes the specs ensure that the models being used actually accomplish the user's intended purpose. But not a showstopper for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a good point, but there is also some common wisdom that a project does not need to test the functionality of its external dependencies.

This depends a little on the testing philosophy of this project, I prefer to wait on maintainer feedback before making this change.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question to ponder. There's value in adding the checks for content as it tests if the LLM has actually received the file. The only problem is understanding: we don't want to test that. Since I don't think we can separate the two I'm leaning towards checking the content too. We should probably add similar content checks to the rest of the spec, but perhaps in another PR.

@crmne
Copy link
Owner

crmne commented Sep 13, 2025

What a great PR! Clean, focused, following the spirit of the project.

I would have loved support in other providers as well, if you have the API keys.

Copy link

codecov bot commented Sep 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.37%. Comparing base (078ef25) to head (56ece26).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #405      +/-   ##
==========================================
+ Coverage   84.29%   84.37%   +0.08%     
==========================================
  Files          36       36              
  Lines        1897     1907      +10     
  Branches      493      495       +2     
==========================================
+ Hits         1599     1609      +10     
  Misses        298      298              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@altxtech
Copy link
Contributor Author

I'm glad you enjoyed it! I love when my PRs are well received.

Don't have other API keys, unfortunately. Think my contributions will be limited to Google/Gemini, at least for now.

@gobijan
Copy link

gobijan commented Sep 14, 2025

I'm glad you enjoyed it! I love when my PRs are well received.

Don't have other API keys, unfortunately. Think my contributions will be limited to Google/Gemini, at least for now.

Thx @arnodirlam for doing the hard work for video support. Thx @altxtech for polishing the last remaining bits.
Great to get this in now :)

@crmne
Copy link
Owner

crmne commented Sep 14, 2025

I added VCRs for VertexAI and couldn't find any other major provider supporting video input directly so let's ship this!

@crmne crmne merged commit 4ff2231 into crmne:main Sep 14, 2025
@arnodirlam
Copy link
Contributor

I added VCRs for VertexAI and couldn't find any other major provider supporting video input directly so let's ship this!

Nice! 🎉 Yea, I also did try with all providers we have, and Gemini was the only one supporting video input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] video file support
5 participants