-
-
Notifications
You must be signed in to change notification settings - Fork 234
Add RubyLLM.transcribe method. #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
736d337
b0bf924
7598745
096ffd3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
--- | ||
layout: default | ||
title: Audio Transcription | ||
parent: Guides | ||
nav_order: 8 | ||
permalink: /guides/audio-transcription | ||
--- | ||
|
||
# Audio Transcription | ||
|
||
RubyLLM makes it easy to transcribe audio content using AI models. This guide covers how to convert speech to text using transcription models. | ||
|
||
## Basic Transcription | ||
|
||
The simplest way to convert speech to text is using the global `transcribe` method with only a local audio file path: | ||
|
||
```ruby | ||
# Transcribe an audio file | ||
text = RubyLLM.transcribe("meeting.wav") | ||
|
||
# Print the transcribed text | ||
puts text | ||
``` | ||
|
||
This method automatically uses the default transcription model (whisper-1) to convert the audio file to text. | ||
|
||
## Specifying a Language | ||
|
||
If you know the language in the audio, you can provide a hint to improve transcription accuracy: | ||
|
||
```ruby | ||
# Transcribe Spanish audio | ||
spanish_text = RubyLLM.transcribe("entrevista.mp3", language: "Spanish") | ||
``` | ||
|
||
## Choosing Models | ||
|
||
You can specify which model to use for transcription: | ||
|
||
```ruby | ||
# Use a specific model | ||
text = RubyLLM.transcribe( | ||
"interview.mp3", | ||
model: "whisper-1" | ||
) | ||
``` | ||
|
||
You can configure the default transcription model globally: | ||
|
||
```ruby | ||
RubyLLM.configure do |config| | ||
config.default_transcription_model = "whisper-1" | ||
end | ||
``` | ||
|
||
## Working with Large Files | ||
|
||
For longer audio files, be aware of potential timeout issues. You can set a global timeout in your application configuration: | ||
|
||
```ruby | ||
RubyLLM.configure do |config| | ||
# Set a longer timeout for large files (in seconds) | ||
config.request_timeout = 300 | ||
end | ||
``` | ||
|
||
Currently, RubyLLM doesn't support per-request timeout configuration. For handling very large files, you may need to increase the global timeout or consider breaking up the audio into smaller segments. | ||
|
||
## Next Steps | ||
|
||
Now that you understand audio transcription, you might want to explore: | ||
|
||
- [Error Handling]({% link guides/error-handling.md %}) for robust applications | ||
- [Tools]({% link guides/tools.md %}) to extend AI capabilities |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,35 @@ def format | |
to_a | ||
end | ||
|
||
# Determine the MIME type based on file extension | ||
def self.mime_type_for(path) # rubocop:disable Metrics/CyclomaticComplexity, Metrics/MethodLength | ||
ext = File.extname(path).delete('.').downcase | ||
|
||
case ext | ||
when 'jpeg', 'jpg' | ||
'image/jpeg' | ||
when 'png' | ||
'image/png' | ||
when 'gif' | ||
'image/gif' | ||
when 'webp' | ||
'image/webp' | ||
when 'mgpa', 'mp3', 'mpeg' | ||
'audio/mpeg' | ||
when 'm4a', 'mp4' | ||
'audio/mp4' | ||
when 'wav' | ||
'audio/wav' | ||
when 'ogg' | ||
'audio/ogg' | ||
when 'webm' | ||
'audio/webm' | ||
else | ||
# Default to the extension as the subtype | ||
"application/#{ext}" | ||
end | ||
end | ||
Comment on lines
+37
to
+64
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We now have a |
||
|
||
private | ||
|
||
def attach_image(source) # rubocop:disable Metrics/MethodLength | ||
|
@@ -97,8 +126,7 @@ def encode_file(source) | |
end | ||
|
||
def mime_type_for(path) | ||
ext = File.extname(path).delete('.') | ||
"image/#{ext}" | ||
self.class.mime_type_for(path) | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,7 +15,8 @@ module RubyLLM | |
class ModelInfo | ||
attr_reader :id, :created_at, :display_name, :provider, :metadata, | ||
:context_window, :max_tokens, :supports_vision, :supports_functions, | ||
:supports_json_mode, :input_price_per_million, :output_price_per_million, :type, :family | ||
:supports_json_mode, :input_price_per_million, :output_price_per_million, | ||
:type, :family | ||
Comment on lines
-18
to
+19
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did this change? |
||
|
||
def initialize(data) # rubocop:disable Metrics/AbcSize,Metrics/MethodLength | ||
@id = data[:id] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# frozen_string_literal: true | ||
|
||
module RubyLLM | ||
module Providers | ||
module OpenAI | ||
# Handles audio transcription functionality for the OpenAI API | ||
module Transcription | ||
# Helper methods as module_function | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove this comment. |
||
|
||
module_function | ||
|
||
def self.extended(base) | ||
# module_function causes the 'transcribe' method to be private, but we need it to be public | ||
base.public_class_method :transcribe | ||
end | ||
Comment on lines
+12
to
+15
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no need for this. simply move your transcribe above |
||
|
||
def self.transcribe(audio_file, model: nil, language: nil) | ||
model ||= RubyLLM.config.default_transcription_model | ||
payload = render_transcription_payload(audio_file, model: model, language: language) | ||
|
||
response = post_multipart(transcription_url, payload) | ||
parse_transcription_response(response) | ||
end | ||
|
||
def transcription_url | ||
"#{api_base}/audio/transcriptions" | ||
end | ||
Comment on lines
+25
to
+27
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please remove |
||
|
||
def api_base | ||
'https://api.openai.com/v1' | ||
end | ||
Comment on lines
+29
to
+31
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please remove this. It's at the top level of the provider module. |
||
|
||
def headers | ||
{ | ||
'Authorization' => "Bearer #{RubyLLM.config.openai_api_key}" | ||
} | ||
end | ||
Comment on lines
+33
to
+37
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this here? |
||
|
||
def post_multipart(url, payload) | ||
connection = Faraday.new(url: api_base) do |f| | ||
f.request :multipart | ||
f.request :url_encoded | ||
f.adapter Faraday.default_adapter | ||
end | ||
|
||
response = connection.post(url) do |req| | ||
req.headers.merge!(headers) | ||
req.body = payload | ||
end | ||
|
||
JSON.parse(response.body) | ||
end | ||
Comment on lines
+39
to
+52
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This really should not be in the Transcription module of the OpenAI provider. This is a generic method that should go in |
||
|
||
def render_transcription_payload(audio_file, model:, language: nil) | ||
file_part = Faraday::Multipart::FilePart.new(audio_file, Content.mime_type_for(audio_file)) | ||
|
||
payload = { | ||
model: model, | ||
file: file_part | ||
} | ||
|
||
# Add language if provided | ||
payload[:language] = language if language | ||
|
||
payload | ||
end | ||
|
||
def parse_transcription_response(response) | ||
response['text'] | ||
end | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'spec_helper' | ||
|
||
RSpec.describe RubyLLM do | ||
include_context 'with configured RubyLLM' | ||
|
||
let(:audio_path) { File.expand_path('../fixtures/ruby.wav', __dir__) } | ||
let(:default_model) { 'whisper-1' } | ||
|
||
before do | ||
allow(described_class.config).to receive(:default_transcription_model).and_return(default_model) | ||
allow(described_class::Models).to receive(:find).with(default_model) | ||
allow(described_class::Provider).to receive(:for).with(default_model).and_return(described_class::Providers::OpenAI) | ||
allow(described_class::Providers::OpenAI).to receive(:transcribe) | ||
end | ||
|
||
describe '.transcribe' do | ||
it 'uses the default model from config when no model is specified' do # rubocop:disable RSpec/MultipleExpectations | ||
described_class.transcribe(audio_path) | ||
|
||
expect(described_class::Provider).to have_received(:for).with(default_model) | ||
expect(described_class::Providers::OpenAI).to have_received(:transcribe).with( | ||
audio_path, model: default_model, language: nil | ||
) | ||
end | ||
|
||
it 'validates and uses a custom model when specified' do # rubocop:disable RSpec/MultipleExpectations, RSpec/ExampleLength | ||
custom_model = 'whisper-large' | ||
allow(described_class::Models).to receive(:find).with(custom_model) | ||
allow(described_class::Provider).to receive(:for).with(custom_model) | ||
.and_return(described_class::Providers::OpenAI) | ||
|
||
described_class.transcribe(audio_path, model: custom_model) | ||
|
||
expect(described_class::Models).to have_received(:find).with(custom_model) | ||
expect(described_class::Provider).to have_received(:for).with(custom_model) | ||
expect(described_class::Providers::OpenAI).to have_received(:transcribe).with( | ||
audio_path, model: custom_model, language: nil | ||
) | ||
end | ||
|
||
it 'passes language parameter to the provider' do | ||
language = 'en' | ||
|
||
described_class.transcribe(audio_path, language: language) | ||
|
||
expect(described_class::Providers::OpenAI).to have_received(:transcribe).with( | ||
audio_path, model: default_model, language: language | ||
) | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in a very awkward spot.