From ab3850a4dc0dfac113e9aac51b22c0bcf72f555e Mon Sep 17 00:00:00 2001 From: Kieran Klaassen Date: Wed, 18 Feb 2026 21:16:01 -0800 Subject: [PATCH 1/4] feat: Add with_fallback for model-level failover When a model is overloaded or unavailable after retries are exhausted, automatically switch to a fallback model. Triggers on transient errors only (429, 500, 502-503, 529). Restores original model if fallback also fails. Logs when fallback activates. Closes #621 Co-Authored-By: Claude Opus 4.6 --- README.md | 8 + docs/_advanced/error-handling.md | 31 +++- docs/_core_features/chat.md | 10 + lib/ruby_llm/active_record/chat_methods.rb | 5 + lib/ruby_llm/chat.rb | 72 ++++++-- spec/ruby_llm/chat_fallback_spec.rb | 202 +++++++++++++++++++++ 6 files changed, 310 insertions(+), 18 deletions(-) create mode 100644 spec/ruby_llm/chat_fallback_spec.rb diff --git a/README.md b/README.md index 52aeab436..ebf96d41e 100644 --- a/README.md +++ b/README.md @@ -59,6 +59,13 @@ chat.ask "Tell me a story about Ruby" do |chunk| end ``` +```ruby +# Automatic failover when a model is overloaded +chat = RubyLLM.chat(model: "gemini-2.5-flash-lite") + .with_fallback("gemini-2.5-flash") + .ask("Classify this email") +``` + ```ruby # Generate images RubyLLM.paint "a sunset over mountains in watercolor style" @@ -131,6 +138,7 @@ response = chat.with_schema(ProductSchema).ask "Analyze this product", with: "pr * **Tools:** Let AI call your Ruby methods * **Agents:** Reusable assistants with `RubyLLM::Agent` * **Structured output:** JSON schemas that just work +* **Fallback:** Automatic model failover with `with_fallback` * **Streaming:** Real-time responses with blocks * **Rails:** ActiveRecord integration with `acts_as_chat` * **Async:** Fiber-based concurrency diff --git a/docs/_advanced/error-handling.md b/docs/_advanced/error-handling.md index a1dd7962d..3aa19bc82 100644 --- a/docs/_advanced/error-handling.md +++ b/docs/_advanced/error-handling.md @@ -226,9 +226,38 @@ This will cause RubyLLM to log detailed information about API requests and respo * **Be Specific:** Rescue specific error classes whenever possible for tailored recovery logic. * **Log Errors:** Always log errors, including relevant context (model used, input data if safe) for debugging. Consider using the `response` attribute on `RubyLLM::Error` for more details. * **User Feedback:** Provide clear, user-friendly feedback when an AI operation fails. Avoid exposing raw API error messages directly. -* **Fallbacks:** Consider fallback mechanisms (e.g., trying a different model, using cached data, providing a default response) if the AI service is critical to your application's function. +* **Fallbacks:** Use `with_fallback` to automatically try an alternative model when the primary is unavailable (see below). * **Monitor:** Track the frequency of different error types in production to identify recurring issues with providers or your implementation. +## Model Fallback + +When a model is overloaded or unavailable, `with_fallback` automatically switches to an alternative model after retries are exhausted. + +```ruby +chat = RubyLLM.chat(model: "gemini-2.5-flash-lite") + .with_fallback("gemini-2.5-flash") + .ask("Classify this email") +``` + +Fallback triggers on transient errors only: `RateLimitError` (429), `ServerError` (500), `ServiceUnavailableError` (502-503), and `OverloadedError` (529). Auth and input errors like `BadRequestError` or `UnauthorizedError` are raised immediately. + +```ruby +# Cross-provider fallback +chat = RubyLLM.chat(model: "gemini-2.5-flash-lite") + .with_fallback("claude-haiku-4-5-20251001") + +# Works with streaming +chat.ask("Summarize this") { |chunk| print chunk.content } +``` + +If the fallback model also fails, the original error is re-raised and the chat is restored to its original model. Message history is preserved across fallback attempts. + +When fallback triggers, RubyLLM logs a warning: + +``` +RubyLLM: RubyLLM::ServiceUnavailableError on gemini-2.5-flash-lite, falling back to gemini-2.5-flash +``` + ## Next Steps * [Using Tools]({% link _core_features/tools.md %}) diff --git a/docs/_core_features/chat.md b/docs/_core_features/chat.md index fd2fa1412..856ac9e4c 100644 --- a/docs/_core_features/chat.md +++ b/docs/_core_features/chat.md @@ -126,6 +126,16 @@ chat.with_model('{{ site.models.anthropic_latest }}') response2 = chat.ask "Follow-up question..." ``` +You can also set a fallback model that kicks in automatically when the primary model is unavailable: + +```ruby +chat = RubyLLM.chat(model: "gemini-2.5-flash-lite") + .with_fallback("gemini-2.5-flash") + .ask("Classify this email") +``` + +See [Error Handling]({% link _advanced/error-handling.md %}#model-fallback) for details on which errors trigger fallback. + For detailed information about model selection, capabilities, aliases, and working with custom models, see the [Working with Models Guide]({% link _advanced/models.md %}). ## Multi-modal Conversations diff --git a/lib/ruby_llm/active_record/chat_methods.rb b/lib/ruby_llm/active_record/chat_methods.rb index dde2f2b92..99e9566f7 100644 --- a/lib/ruby_llm/active_record/chat_methods.rb +++ b/lib/ruby_llm/active_record/chat_methods.rb @@ -119,6 +119,11 @@ def with_model(model_name, provider: nil, assume_exists: false) self end + def with_fallback(...) + to_llm.with_fallback(...) + self + end + def with_temperature(...) to_llm.with_temperature(...) self diff --git a/lib/ruby_llm/chat.rb b/lib/ruby_llm/chat.rb index 79eedd931..938c03425 100644 --- a/lib/ruby_llm/chat.rb +++ b/lib/ruby_llm/chat.rb @@ -62,6 +62,11 @@ def with_tools(*tools, replace: false) self end + def with_fallback(model_id, provider: nil) + @fallback = { model: model_id, provider: provider } + self + end + def with_model(model_id, provider: nil, assume_exists: false) @model, @provider = Models.resolve(model_id, provider:, assume_exists:, config: @config) @connection = @provider.connection @@ -134,7 +139,56 @@ def each(&) messages.each(&) end - def complete(&) # rubocop:disable Metrics/PerceivedComplexity + def complete(&) + complete_with_fallback(&) + end + + def add_message(message_or_attributes) + message = message_or_attributes.is_a?(Message) ? message_or_attributes : Message.new(message_or_attributes) + messages << message + message + end + + def reset_messages! + @messages.clear + end + + def instance_variables + super - %i[@connection @config] + end + + private + + FALLBACK_ERRORS = [ + RateLimitError, + ServerError, + ServiceUnavailableError, + OverloadedError + ].freeze + + def complete_with_fallback(&) + call_provider(&) + rescue *FALLBACK_ERRORS => e + raise unless @fallback + + original_model = @model + original_provider = @provider + original_connection = @connection + + RubyLLM.logger.warn "RubyLLM: #{e.class} on #{original_model.id}, falling back to #{@fallback[:model]}" + + begin + with_model(@fallback[:model], provider: @fallback[:provider]) + call_provider(&) + rescue *FALLBACK_ERRORS + @model = original_model + @provider = original_provider + @connection = original_connection + raise e + end + end + + def call_provider(&) # rubocop:disable Metrics/PerceivedComplexity response = @provider.complete( messages, tools: @tools, @@ -167,22 +221,6 @@ def complete(&) # rubocop:disable Metrics/PerceivedComplexity end end - def add_message(message_or_attributes) - message = message_or_attributes.is_a?(Message) ? message_or_attributes : Message.new(message_or_attributes) - messages << message - message - end - - def reset_messages! - @messages.clear - end - - def instance_variables - super - %i[@connection @config] - end - - private - def wrap_streaming_block(&block) return nil unless block_given? diff --git a/spec/ruby_llm/chat_fallback_spec.rb b/spec/ruby_llm/chat_fallback_spec.rb new file mode 100644 index 000000000..5cdb0a54c --- /dev/null +++ b/spec/ruby_llm/chat_fallback_spec.rb @@ -0,0 +1,202 @@ +# frozen_string_literal: true + +require 'spec_helper' + +RSpec.describe RubyLLM::Chat do + include_context 'with configured RubyLLM' + + describe '#with_fallback' do + let(:chat) { described_class.new(model: 'gpt-4.1-nano') } + + it 'returns self for chaining' do + expect(chat.with_fallback('claude-haiku-4-5-20251001')).to eq(chat) + end + + it 'tries fallback model on transient errors' do + chat.with_fallback('claude-haiku-4-5-20251001') + + call_count = 0 + allow(chat.instance_variable_get(:@provider)).to receive(:complete) do + call_count += 1 + raise RubyLLM::ServiceUnavailableError.new(nil, 'model experiencing high demand') + end + + fallback_provider = instance_double(RubyLLM::Provider) + fallback_response = RubyLLM::Message.new(role: :assistant, content: 'Hello from fallback!') + + allow(RubyLLM::Models).to receive(:resolve).and_call_original + allow(RubyLLM::Models).to receive(:resolve) + .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) + .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) + + allow(fallback_provider).to receive(:connection).and_return(double) + allow(fallback_provider).to receive(:complete).and_return(fallback_response) + + chat.add_message(role: :user, content: 'Hello') + response = chat.complete + + expect(response.content).to eq('Hello from fallback!') + expect(call_count).to eq(1) + end + + it 'raises original error when fallback also fails' do + chat.with_fallback('claude-haiku-4-5-20251001') + + original_error = RubyLLM::ServiceUnavailableError.new(nil, 'primary down') + allow(chat.instance_variable_get(:@provider)).to receive(:complete) + .and_raise(original_error) + + fallback_provider = instance_double(RubyLLM::Provider) + allow(RubyLLM::Models).to receive(:resolve).and_call_original + allow(RubyLLM::Models).to receive(:resolve) + .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) + .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) + allow(fallback_provider).to receive(:connection).and_return(double) + allow(fallback_provider).to receive(:complete) + .and_raise(RubyLLM::ServerError.new(nil, 'fallback also down')) + + chat.add_message(role: :user, content: 'Hello') + + expect { chat.complete }.to raise_error(RubyLLM::ServiceUnavailableError, 'primary down') + end + + it 'restores original model when fallback fails' do + chat.with_fallback('claude-haiku-4-5-20251001') + + original_model = chat.model + original_provider = chat.instance_variable_get(:@provider) + + allow(original_provider).to receive(:complete) + .and_raise(RubyLLM::OverloadedError.new(nil, 'overloaded')) + + fallback_provider = instance_double(RubyLLM::Provider) + allow(RubyLLM::Models).to receive(:resolve).and_call_original + allow(RubyLLM::Models).to receive(:resolve) + .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) + .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) + allow(fallback_provider).to receive(:connection).and_return(double) + allow(fallback_provider).to receive(:complete) + .and_raise(RubyLLM::ServerError.new(nil, 'fallback down')) + + chat.add_message(role: :user, content: 'Hello') + + expect { chat.complete }.to raise_error(RubyLLM::OverloadedError) + expect(chat.model).to eq(original_model) + expect(chat.instance_variable_get(:@provider)).to eq(original_provider) + end + + it 'does not trigger fallback on non-transient errors' do + chat.with_fallback('claude-haiku-4-5-20251001') + + allow(chat.instance_variable_get(:@provider)).to receive(:complete) + .and_raise(RubyLLM::BadRequestError.new(nil, 'invalid request')) + + chat.add_message(role: :user, content: 'Hello') + + expect { chat.complete }.to raise_error(RubyLLM::BadRequestError, 'invalid request') + end + + it 'does not trigger fallback on auth errors' do + chat.with_fallback('claude-haiku-4-5-20251001') + + allow(chat.instance_variable_get(:@provider)).to receive(:complete) + .and_raise(RubyLLM::UnauthorizedError.new(nil, 'bad key')) + + chat.add_message(role: :user, content: 'Hello') + + expect { chat.complete }.to raise_error(RubyLLM::UnauthorizedError) + end + + it 'preserves message history across fallback' do + chat.with_fallback('claude-haiku-4-5-20251001') + chat.add_message(role: :user, content: 'First message') + chat.add_message(role: :assistant, content: 'First reply') + chat.add_message(role: :user, content: 'Second message') + + allow(chat.instance_variable_get(:@provider)).to receive(:complete) + .and_raise(RubyLLM::RateLimitError.new(nil, 'rate limited')) + + captured_messages = nil + fallback_provider = instance_double(RubyLLM::Provider) + allow(RubyLLM::Models).to receive(:resolve).and_call_original + allow(RubyLLM::Models).to receive(:resolve) + .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) + .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) + allow(fallback_provider).to receive(:connection).and_return(double) + allow(fallback_provider).to receive(:complete) do |messages, **_kwargs| + captured_messages = messages.dup + RubyLLM::Message.new(role: :assistant, content: 'Fallback reply') + end + + chat.complete + + expect(captured_messages.length).to eq(3) + expect(captured_messages[0].content).to eq('First message') + expect(captured_messages[1].content).to eq('First reply') + expect(captured_messages[2].content).to eq('Second message') + end + + it 'works with streaming' do + chat.with_fallback('claude-haiku-4-5-20251001') + + allow(chat.instance_variable_get(:@provider)).to receive(:complete) + .and_raise(RubyLLM::ServiceUnavailableError.new(nil, 'unavailable')) + + fallback_provider = instance_double(RubyLLM::Provider) + allow(RubyLLM::Models).to receive(:resolve).and_call_original + allow(RubyLLM::Models).to receive(:resolve) + .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) + .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) + allow(fallback_provider).to receive(:connection).and_return(double) + allow(fallback_provider).to receive(:complete) do |_messages, **_kwargs, &block| + block&.call(RubyLLM::Chunk.new(role: :assistant, content: 'chunk')) + RubyLLM::Message.new(role: :assistant, content: 'streamed reply') + end + + chat.add_message(role: :user, content: 'Hello') + + chunks = [] + response = chat.complete { |chunk| chunks << chunk } + + expect(response.content).to eq('streamed reply') + expect(chunks).not_to be_empty + end + + it 'does not fallback when no fallback is configured' do + allow(chat.instance_variable_get(:@provider)).to receive(:complete) + .and_raise(RubyLLM::ServiceUnavailableError.new(nil, 'unavailable')) + + chat.add_message(role: :user, content: 'Hello') + + expect { chat.complete }.to raise_error(RubyLLM::ServiceUnavailableError) + end + + [ + RubyLLM::RateLimitError, + RubyLLM::ServerError, + RubyLLM::ServiceUnavailableError, + RubyLLM::OverloadedError + ].each do |error_class| + it "triggers fallback on #{error_class.name.split('::').last}" do + chat.with_fallback('claude-haiku-4-5-20251001') + + allow(chat.instance_variable_get(:@provider)).to receive(:complete) + .and_raise(error_class.new(nil, 'error')) + + fallback_provider = instance_double(RubyLLM::Provider) + allow(RubyLLM::Models).to receive(:resolve).and_call_original + allow(RubyLLM::Models).to receive(:resolve) + .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) + .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) + allow(fallback_provider).to receive(:connection).and_return(double) + allow(fallback_provider).to receive(:complete) + .and_return(RubyLLM::Message.new(role: :assistant, content: 'ok')) + + chat.add_message(role: :user, content: 'Hello') + response = chat.complete + + expect(response.content).to eq('ok') + end + end + end +end From cf0a3ef5a6df9a0ebe936adf904bf0b1ae7a5555 Mon Sep 17 00:00:00 2001 From: Kieran Klaassen Date: Wed, 18 Feb 2026 21:22:23 -0800 Subject: [PATCH 2/4] fix: Always restore model after fallback, prevent recursive fallback - Use ensure block to restore original model/provider/connection after fallback attempt, regardless of success or failure - Add @in_fallback guard to prevent double-fallback during tool call recursion - Move FALLBACK_ERRORS constant to top of class per codebase convention - Initialize @fallback and @in_fallback in constructor for consistency - Inline complete_with_fallback into complete - Extract shared test setup, remove duplicate test Co-Authored-By: Claude Opus 4.6 --- lib/ruby_llm/chat.rb | 59 ++++++++++++----------- spec/ruby_llm/chat_fallback_spec.rb | 74 ++++++++++++----------------- 2 files changed, 60 insertions(+), 73 deletions(-) diff --git a/lib/ruby_llm/chat.rb b/lib/ruby_llm/chat.rb index 938c03425..7cae0b4c1 100644 --- a/lib/ruby_llm/chat.rb +++ b/lib/ruby_llm/chat.rb @@ -7,6 +7,13 @@ class Chat attr_reader :model, :messages, :tools, :params, :headers, :schema + FALLBACK_ERRORS = [ + RateLimitError, + ServerError, + ServiceUnavailableError, + OverloadedError + ].freeze + def initialize(model: nil, provider: nil, assume_model_exists: false, context: nil) if assume_model_exists && !provider raise ArgumentError, 'Provider must be specified if assume_model_exists is true' @@ -23,6 +30,8 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n @headers = {} @schema = nil @thinking = nil + @fallback = nil + @in_fallback = false @on = { new_message: nil, end_message: nil, @@ -140,36 +149,9 @@ def each(&) end def complete(&) - complete_with_fallback(&) - end - - def add_message(message_or_attributes) - message = message_or_attributes.is_a?(Message) ? message_or_attributes : Message.new(message_or_attributes) - messages << message - message - end - - def reset_messages! - @messages.clear - end - - def instance_variables - super - %i[@connection @config] - end - - private - - FALLBACK_ERRORS = [ - RateLimitError, - ServerError, - ServiceUnavailableError, - OverloadedError - ].freeze - - def complete_with_fallback(&) call_provider(&) rescue *FALLBACK_ERRORS => e - raise unless @fallback + raise unless @fallback && !@in_fallback original_model = @model original_provider = @provider @@ -178,16 +160,35 @@ def complete_with_fallback(&) RubyLLM.logger.warn "RubyLLM: #{e.class} on #{original_model.id}, falling back to #{@fallback[:model]}" begin + @in_fallback = true with_model(@fallback[:model], provider: @fallback[:provider]) call_provider(&) rescue *FALLBACK_ERRORS + raise e + ensure + @in_fallback = false @model = original_model @provider = original_provider @connection = original_connection - raise e end end + def add_message(message_or_attributes) + message = message_or_attributes.is_a?(Message) ? message_or_attributes : Message.new(message_or_attributes) + messages << message + message + end + + def reset_messages! + @messages.clear + end + + def instance_variables + super - %i[@connection @config] + end + + private + def call_provider(&) # rubocop:disable Metrics/PerceivedComplexity response = @provider.complete( messages, diff --git a/spec/ruby_llm/chat_fallback_spec.rb b/spec/ruby_llm/chat_fallback_spec.rb index 5cdb0a54c..6edd15a4b 100644 --- a/spec/ruby_llm/chat_fallback_spec.rb +++ b/spec/ruby_llm/chat_fallback_spec.rb @@ -6,7 +6,16 @@ include_context 'with configured RubyLLM' describe '#with_fallback' do - let(:chat) { described_class.new(model: 'gpt-4.1-nano') } + let(:chat) { RubyLLM.chat(model: 'gpt-4.1-nano') } + let(:fallback_provider) { instance_double(RubyLLM::Provider) } + + before do + allow(RubyLLM::Models).to receive(:resolve).and_call_original + allow(RubyLLM::Models).to receive(:resolve) + .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) + .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) + allow(fallback_provider).to receive(:connection).and_return(double) + end it 'returns self for chaining' do expect(chat.with_fallback('claude-haiku-4-5-20251001')).to eq(chat) @@ -21,16 +30,8 @@ raise RubyLLM::ServiceUnavailableError.new(nil, 'model experiencing high demand') end - fallback_provider = instance_double(RubyLLM::Provider) - fallback_response = RubyLLM::Message.new(role: :assistant, content: 'Hello from fallback!') - - allow(RubyLLM::Models).to receive(:resolve).and_call_original - allow(RubyLLM::Models).to receive(:resolve) - .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) - .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) - - allow(fallback_provider).to receive(:connection).and_return(double) - allow(fallback_provider).to receive(:complete).and_return(fallback_response) + allow(fallback_provider).to receive(:complete) + .and_return(RubyLLM::Message.new(role: :assistant, content: 'Hello from fallback!')) chat.add_message(role: :user, content: 'Hello') response = chat.complete @@ -45,13 +46,6 @@ original_error = RubyLLM::ServiceUnavailableError.new(nil, 'primary down') allow(chat.instance_variable_get(:@provider)).to receive(:complete) .and_raise(original_error) - - fallback_provider = instance_double(RubyLLM::Provider) - allow(RubyLLM::Models).to receive(:resolve).and_call_original - allow(RubyLLM::Models).to receive(:resolve) - .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) - .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) - allow(fallback_provider).to receive(:connection).and_return(double) allow(fallback_provider).to receive(:complete) .and_raise(RubyLLM::ServerError.new(nil, 'fallback also down')) @@ -60,7 +54,7 @@ expect { chat.complete }.to raise_error(RubyLLM::ServiceUnavailableError, 'primary down') end - it 'restores original model when fallback fails' do + it 'restores original model after successful fallback' do chat.with_fallback('claude-haiku-4-5-20251001') original_model = chat.model @@ -68,13 +62,24 @@ allow(original_provider).to receive(:complete) .and_raise(RubyLLM::OverloadedError.new(nil, 'overloaded')) + allow(fallback_provider).to receive(:complete) + .and_return(RubyLLM::Message.new(role: :assistant, content: 'ok')) - fallback_provider = instance_double(RubyLLM::Provider) - allow(RubyLLM::Models).to receive(:resolve).and_call_original - allow(RubyLLM::Models).to receive(:resolve) - .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) - .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) - allow(fallback_provider).to receive(:connection).and_return(double) + chat.add_message(role: :user, content: 'Hello') + chat.complete + + expect(chat.model).to eq(original_model) + expect(chat.instance_variable_get(:@provider)).to eq(original_provider) + end + + it 'restores original model when fallback fails' do + chat.with_fallback('claude-haiku-4-5-20251001') + + original_model = chat.model + original_provider = chat.instance_variable_get(:@provider) + + allow(original_provider).to receive(:complete) + .and_raise(RubyLLM::OverloadedError.new(nil, 'overloaded')) allow(fallback_provider).to receive(:complete) .and_raise(RubyLLM::ServerError.new(nil, 'fallback down')) @@ -117,12 +122,6 @@ .and_raise(RubyLLM::RateLimitError.new(nil, 'rate limited')) captured_messages = nil - fallback_provider = instance_double(RubyLLM::Provider) - allow(RubyLLM::Models).to receive(:resolve).and_call_original - allow(RubyLLM::Models).to receive(:resolve) - .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) - .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) - allow(fallback_provider).to receive(:connection).and_return(double) allow(fallback_provider).to receive(:complete) do |messages, **_kwargs| captured_messages = messages.dup RubyLLM::Message.new(role: :assistant, content: 'Fallback reply') @@ -142,12 +141,6 @@ allow(chat.instance_variable_get(:@provider)).to receive(:complete) .and_raise(RubyLLM::ServiceUnavailableError.new(nil, 'unavailable')) - fallback_provider = instance_double(RubyLLM::Provider) - allow(RubyLLM::Models).to receive(:resolve).and_call_original - allow(RubyLLM::Models).to receive(:resolve) - .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) - .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) - allow(fallback_provider).to receive(:connection).and_return(double) allow(fallback_provider).to receive(:complete) do |_messages, **_kwargs, &block| block&.call(RubyLLM::Chunk.new(role: :assistant, content: 'chunk')) RubyLLM::Message.new(role: :assistant, content: 'streamed reply') @@ -182,13 +175,6 @@ allow(chat.instance_variable_get(:@provider)).to receive(:complete) .and_raise(error_class.new(nil, 'error')) - - fallback_provider = instance_double(RubyLLM::Provider) - allow(RubyLLM::Models).to receive(:resolve).and_call_original - allow(RubyLLM::Models).to receive(:resolve) - .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) - .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) - allow(fallback_provider).to receive(:connection).and_return(double) allow(fallback_provider).to receive(:complete) .and_return(RubyLLM::Message.new(role: :assistant, content: 'ok')) From a94feaf9224c5877a90225d0bcec9d50055021ba Mon Sep 17 00:00:00 2001 From: Kieran Klaassen Date: Wed, 18 Feb 2026 22:17:30 -0800 Subject: [PATCH 3/4] fix: Harden fallback with transport errors, Agent DSL, AR safety, and legacy support - Add Faraday::TimeoutError and Faraday::ConnectionFailed to FALLBACK_ERRORS - Map HTTP 504 to ServiceUnavailableError for fallback coverage - Log fallback error details when both primary and fallback fail - Sanitize all dynamic values in fallback log lines - Add fallback macro to Agent DSL for feature parity - Add with_fallback to legacy acts_as integration - Guard persist_new_message against destroying valid messages (tool calls, content_raw/structured output) - Widen AR cleanup rescue to include Faraday transport errors - Refactor fallback specs to eliminate instance_variable_get coupling Co-Authored-By: Claude Opus 4.6 --- lib/ruby_llm/active_record/acts_as_legacy.rb | 5 + lib/ruby_llm/active_record/chat_methods.rb | 7 +- lib/ruby_llm/agent.rb | 12 ++ lib/ruby_llm/chat.rb | 13 +- lib/ruby_llm/error.rb | 2 +- spec/ruby_llm/active_record/acts_as_spec.rb | 81 +++++++++++++ spec/ruby_llm/agent_spec.rb | 79 ++++++++++++ spec/ruby_llm/chat_fallback_spec.rb | 121 ++++++++++++++++--- 8 files changed, 297 insertions(+), 23 deletions(-) diff --git a/lib/ruby_llm/active_record/acts_as_legacy.rb b/lib/ruby_llm/active_record/acts_as_legacy.rb index 988156e2c..0317eba22 100644 --- a/lib/ruby_llm/active_record/acts_as_legacy.rb +++ b/lib/ruby_llm/active_record/acts_as_legacy.rb @@ -121,6 +121,11 @@ def with_tools(...) self end + def with_fallback(...) + to_llm.with_fallback(...) + self + end + def with_model(...) update(model_id: to_llm.with_model(...).model.id) self diff --git a/lib/ruby_llm/active_record/chat_methods.rb b/lib/ruby_llm/active_record/chat_methods.rb index 99e9566f7..cd1871ee7 100644 --- a/lib/ruby_llm/active_record/chat_methods.rb +++ b/lib/ruby_llm/active_record/chat_methods.rb @@ -206,7 +206,7 @@ def ask(message, with: nil, &) def complete(...) to_llm.complete(...) - rescue RubyLLM::Error => e + rescue RubyLLM::Error, Faraday::TimeoutError, Faraday::ConnectionFailed => e cleanup_failed_messages if @message&.persisted? && @message.content.blank? cleanup_orphaned_tool_results raise e @@ -284,6 +284,11 @@ def order_messages_for_llm(messages) end def persist_new_message + if @message&.persisted? && @message.content.blank? && + !@message.tool_calls_association.exists? && + (!@message.respond_to?(:content_raw) || @message.content_raw.blank?) + @message.destroy + end @message = messages_association.create!(role: :assistant, content: '') end diff --git a/lib/ruby_llm/agent.rb b/lib/ruby_llm/agent.rb index e5123e705..7f8f78818 100644 --- a/lib/ruby_llm/agent.rb +++ b/lib/ruby_llm/agent.rb @@ -22,6 +22,7 @@ def inherited(subclass) subclass.instance_variable_set(:@context, @context) subclass.instance_variable_set(:@chat_model, @chat_model) subclass.instance_variable_set(:@input_names, (@input_names || []).dup) + subclass.instance_variable_set(:@fallback, @fallback&.dup) end def model(model_id = nil, **options) @@ -74,6 +75,12 @@ def schema(value = nil, &block) @schema = block_given? ? block : value end + def fallback(model_id = nil, provider: nil) + return @fallback if model_id.nil? + + @fallback = { model: model_id, provider: provider } + end + def context(value = nil) return @context if value.nil? @@ -165,6 +172,7 @@ def apply_configuration(chat_object, input_values:, persist_instructions:) apply_params(llm_chat, runtime) apply_headers(llm_chat, runtime) apply_schema(llm_chat, runtime) + apply_fallback(llm_chat) end def apply_context(llm_chat) @@ -206,6 +214,10 @@ def apply_schema(llm_chat, runtime) llm_chat.with_schema(value) if value end + def apply_fallback(llm_chat) + llm_chat.with_fallback(fallback[:model], provider: fallback[:provider]) if fallback + end + def llm_chat_for(chat_object) chat_object.respond_to?(:to_llm) ? chat_object.to_llm : chat_object end diff --git a/lib/ruby_llm/chat.rb b/lib/ruby_llm/chat.rb index 7cae0b4c1..95ba911a3 100644 --- a/lib/ruby_llm/chat.rb +++ b/lib/ruby_llm/chat.rb @@ -11,7 +11,9 @@ class Chat RateLimitError, ServerError, ServiceUnavailableError, - OverloadedError + OverloadedError, + Faraday::TimeoutError, + Faraday::ConnectionFailed ].freeze def initialize(model: nil, provider: nil, assume_model_exists: false, context: nil) @@ -157,13 +159,14 @@ def complete(&) original_provider = @provider original_connection = @connection - RubyLLM.logger.warn "RubyLLM: #{e.class} on #{original_model.id}, falling back to #{@fallback[:model]}" + RubyLLM.logger.warn "RubyLLM: #{e.class} on #{sanitize_for_log(original_model.id)}, falling back to #{sanitize_for_log(@fallback[:model])}" begin @in_fallback = true with_model(@fallback[:model], provider: @fallback[:provider]) call_provider(&) - rescue *FALLBACK_ERRORS + rescue *FALLBACK_ERRORS => fallback_error + RubyLLM.logger.warn "RubyLLM: Fallback to #{sanitize_for_log(@fallback[:model])} also failed: #{fallback_error.class} - #{sanitize_for_log(fallback_error.message)}" raise e ensure @in_fallback = false @@ -285,5 +288,9 @@ def replace_system_instruction(instructions) @messages = system_messages + non_system_messages end + + def sanitize_for_log(value) + value.to_s.gsub(/[\x00-\x1f\x7f]/, '') + end end end diff --git a/lib/ruby_llm/error.rb b/lib/ruby_llm/error.rb index 3908bee27..b22dd3a67 100644 --- a/lib/ruby_llm/error.rb +++ b/lib/ruby_llm/error.rb @@ -61,7 +61,7 @@ def parse_error(provider:, response:) # rubocop:disable Metrics/PerceivedComplex raise RateLimitError.new(response, message || 'Rate limit exceeded - please wait a moment') when 500 raise ServerError.new(response, message || 'API server error - please try again') - when 502..503 + when 502..504 raise ServiceUnavailableError.new(response, message || 'API server unavailable - please try again later') when 529 raise OverloadedError.new(response, message || 'Service overloaded - please try again later') diff --git a/spec/ruby_llm/active_record/acts_as_spec.rb b/spec/ruby_llm/active_record/acts_as_spec.rb index 2bc839f81..b880cda44 100644 --- a/spec/ruby_llm/active_record/acts_as_spec.rb +++ b/spec/ruby_llm/active_record/acts_as_spec.rb @@ -309,6 +309,87 @@ def execute(input:) end end + describe 'streaming fallback phantom message cleanup' do + it 'destroys blank assistant message when persist_new_message is called again' do + chat = Chat.create!(model: model) + chat.to_llm # initialize @chat and persistence callbacks + + # Simulate first streaming attempt: on_new_message creates a blank assistant row + chat.send(:persist_new_message) + orphaned_message = chat.instance_variable_get(:@message) + expect(orphaned_message).to be_persisted + expect(orphaned_message.content).to eq('') + + orphaned_id = orphaned_message.id + + # Simulate fallback streaming attempt: on_new_message fires again + chat.send(:persist_new_message) + new_message = chat.instance_variable_get(:@message) + + # The orphaned blank message should be destroyed + expect(Message.exists?(orphaned_id)).to be false + + # A new blank assistant message should exist for the fallback attempt + expect(new_message).to be_persisted + expect(new_message.content).to eq('') + expect(new_message.id).not_to eq(orphaned_id) + end + + it 'does not destroy a blank assistant message that has tool calls' do + chat = Chat.create!(model: model) + chat.to_llm + + chat.send(:persist_new_message) + tool_call_message = chat.instance_variable_get(:@message) + # Simulate a tool call response: blank content but has tool_call records + tool_call_message.tool_calls.create!( + tool_call_id: 'call_123', + name: 'test_tool', + arguments: { foo: 'bar' } + ) + tool_call_id = tool_call_message.id + + # Next on_new_message should NOT destroy this message + chat.send(:persist_new_message) + + expect(Message.exists?(tool_call_id)).to be true + end + + it 'does not destroy a blank assistant message that has content_raw' do + chat = Chat.create!(model: model) + chat.to_llm + + chat.send(:persist_new_message) + structured_message = chat.instance_variable_get(:@message) + # Simulate structured output: blank content but content_raw is set + structured_message.update!(content: nil, content_raw: { 'name' => 'Alice', 'age' => 25 }) + structured_id = structured_message.id + + # Next on_new_message should NOT destroy this message + chat.send(:persist_new_message) + + expect(Message.exists?(structured_id)).to be true + end + + it 'does not destroy a populated assistant message when persist_new_message is called' do + chat = Chat.create!(model: model) + chat.to_llm + + # Simulate normal flow: on_new_message creates row, on_end_message populates it + chat.send(:persist_new_message) + populated_message = chat.instance_variable_get(:@message) + populated_message.update!(content: 'Hello, I am the assistant response') + + populated_id = populated_message.id + + # Next on_new_message (e.g., for a tool call follow-up) should NOT destroy the populated message + chat.send(:persist_new_message) + + expect(Message.exists?(populated_id)).to be true + expect(chat.instance_variable_get(:@message).id).not_to eq(populated_id) + end + end + # Custom configuration tests with inline models describe 'custom configurations' do before(:all) do # rubocop:disable RSpec/BeforeAfterAll diff --git a/spec/ruby_llm/agent_spec.rb b/spec/ruby_llm/agent_spec.rb index b61a6eec3..1395e3c44 100644 --- a/spec/ruby_llm/agent_spec.rb +++ b/spec/ruby_llm/agent_spec.rb @@ -136,4 +136,83 @@ def each(&block) agent = Class.new(described_class).new(chat: fake_chat) expect(agent.map(&:upcase)).to eq(%w[FIRST SECOND]) end + + describe 'fallback' do + it 'stores and retrieves fallback config via class macro' do + agent_class = Class.new(RubyLLM::Agent) do + model 'gpt-4.1-nano' + fallback 'claude-haiku-4-5-20251001', provider: :anthropic + end + + expect(agent_class.fallback).to eq({ model: 'claude-haiku-4-5-20251001', provider: :anthropic }) + end + + it 'returns nil when no fallback is configured' do + agent_class = Class.new(RubyLLM::Agent) do + model 'gpt-4.1-nano' + end + + expect(agent_class.fallback).to be_nil + end + + it 'inherits fallback config to subclasses' do + parent_class = Class.new(RubyLLM::Agent) do + model 'gpt-4.1-nano' + fallback 'claude-haiku-4-5-20251001', provider: :anthropic + end + + child_class = Class.new(parent_class) + + expect(child_class.fallback).to eq({ model: 'claude-haiku-4-5-20251001', provider: :anthropic }) + end + + it 'does not affect parent when child overrides fallback' do + parent_class = Class.new(RubyLLM::Agent) do + model 'gpt-4.1-nano' + fallback 'claude-haiku-4-5-20251001', provider: :anthropic + end + + child_class = Class.new(parent_class) do + fallback 'gpt-4.1-mini' + end + + expect(parent_class.fallback).to eq({ model: 'claude-haiku-4-5-20251001', provider: :anthropic }) + expect(child_class.fallback).to eq({ model: 'gpt-4.1-mini', provider: nil }) + end + + it 'applies fallback to the underlying chat via .chat' do + agent_class = Class.new(RubyLLM::Agent) do + model 'gpt-4.1-nano' + fallback 'claude-haiku-4-5-20251001', provider: :anthropic + end + + chat = agent_class.chat + fallback_config = chat.instance_variable_get(:@fallback) + + expect(fallback_config).to eq({ model: 'claude-haiku-4-5-20251001', provider: :anthropic }) + end + + it 'applies fallback to the underlying chat via .new' do + agent_class = Class.new(RubyLLM::Agent) do + model 'gpt-4.1-nano' + fallback 'claude-haiku-4-5-20251001', provider: :anthropic + end + + agent = agent_class.new + fallback_config = agent.chat.instance_variable_get(:@fallback) + + expect(fallback_config).to eq({ model: 'claude-haiku-4-5-20251001', provider: :anthropic }) + end + + it 'does not apply fallback when none is configured' do + agent_class = Class.new(RubyLLM::Agent) do + model 'gpt-4.1-nano' + end + + chat = agent_class.chat + fallback_config = chat.instance_variable_get(:@fallback) + + expect(fallback_config).to be_nil + end + end end diff --git a/spec/ruby_llm/chat_fallback_spec.rb b/spec/ruby_llm/chat_fallback_spec.rb index 6edd15a4b..96d57d35a 100644 --- a/spec/ruby_llm/chat_fallback_spec.rb +++ b/spec/ruby_llm/chat_fallback_spec.rb @@ -6,11 +6,16 @@ include_context 'with configured RubyLLM' describe '#with_fallback' do - let(:chat) { RubyLLM.chat(model: 'gpt-4.1-nano') } + let(:primary_provider) { instance_double(RubyLLM::Provider) } let(:fallback_provider) { instance_double(RubyLLM::Provider) } + let(:chat) { RubyLLM.chat(model: 'gpt-4.1-nano') } before do allow(RubyLLM::Models).to receive(:resolve).and_call_original + allow(RubyLLM::Models).to receive(:resolve) + .with('gpt-4.1-nano', provider: nil, assume_exists: false, config: anything) + .and_return([RubyLLM::Models.find('gpt-4.1-nano'), primary_provider]) + allow(primary_provider).to receive(:connection).and_return(double) allow(RubyLLM::Models).to receive(:resolve) .with('claude-haiku-4-5-20251001', provider: nil, assume_exists: false, config: anything) .and_return([RubyLLM::Models.find('claude-haiku-4-5-20251001'), fallback_provider]) @@ -25,7 +30,7 @@ chat.with_fallback('claude-haiku-4-5-20251001') call_count = 0 - allow(chat.instance_variable_get(:@provider)).to receive(:complete) do + allow(primary_provider).to receive(:complete) do call_count += 1 raise RubyLLM::ServiceUnavailableError.new(nil, 'model experiencing high demand') end @@ -44,8 +49,7 @@ chat.with_fallback('claude-haiku-4-5-20251001') original_error = RubyLLM::ServiceUnavailableError.new(nil, 'primary down') - allow(chat.instance_variable_get(:@provider)).to receive(:complete) - .and_raise(original_error) + allow(primary_provider).to receive(:complete).and_raise(original_error) allow(fallback_provider).to receive(:complete) .and_raise(RubyLLM::ServerError.new(nil, 'fallback also down')) @@ -58,10 +62,16 @@ chat.with_fallback('claude-haiku-4-5-20251001') original_model = chat.model - original_provider = chat.instance_variable_get(:@provider) - allow(original_provider).to receive(:complete) - .and_raise(RubyLLM::OverloadedError.new(nil, 'overloaded')) + primary_calls = 0 + allow(primary_provider).to receive(:complete) do + primary_calls += 1 + if primary_calls == 1 + raise RubyLLM::OverloadedError.new(nil, 'overloaded') + else + RubyLLM::Message.new(role: :assistant, content: 'primary restored') + end + end allow(fallback_provider).to receive(:complete) .and_return(RubyLLM::Message.new(role: :assistant, content: 'ok')) @@ -69,17 +79,27 @@ chat.complete expect(chat.model).to eq(original_model) - expect(chat.instance_variable_get(:@provider)).to eq(original_provider) + + # Verify provider restoration: next call routes to primary + chat.add_message(role: :user, content: 'Hello again') + response = chat.complete + expect(response.content).to eq('primary restored') end it 'restores original model when fallback fails' do chat.with_fallback('claude-haiku-4-5-20251001') original_model = chat.model - original_provider = chat.instance_variable_get(:@provider) - allow(original_provider).to receive(:complete) - .and_raise(RubyLLM::OverloadedError.new(nil, 'overloaded')) + primary_calls = 0 + allow(primary_provider).to receive(:complete) do + primary_calls += 1 + if primary_calls == 1 + raise RubyLLM::OverloadedError.new(nil, 'overloaded') + else + RubyLLM::Message.new(role: :assistant, content: 'primary restored') + end + end allow(fallback_provider).to receive(:complete) .and_raise(RubyLLM::ServerError.new(nil, 'fallback down')) @@ -87,13 +107,17 @@ expect { chat.complete }.to raise_error(RubyLLM::OverloadedError) expect(chat.model).to eq(original_model) - expect(chat.instance_variable_get(:@provider)).to eq(original_provider) + + # Verify provider restoration: next call routes to primary + chat.add_message(role: :user, content: 'Hello again') + response = chat.complete + expect(response.content).to eq('primary restored') end it 'does not trigger fallback on non-transient errors' do chat.with_fallback('claude-haiku-4-5-20251001') - allow(chat.instance_variable_get(:@provider)).to receive(:complete) + allow(primary_provider).to receive(:complete) .and_raise(RubyLLM::BadRequestError.new(nil, 'invalid request')) chat.add_message(role: :user, content: 'Hello') @@ -104,7 +128,7 @@ it 'does not trigger fallback on auth errors' do chat.with_fallback('claude-haiku-4-5-20251001') - allow(chat.instance_variable_get(:@provider)).to receive(:complete) + allow(primary_provider).to receive(:complete) .and_raise(RubyLLM::UnauthorizedError.new(nil, 'bad key')) chat.add_message(role: :user, content: 'Hello') @@ -118,7 +142,7 @@ chat.add_message(role: :assistant, content: 'First reply') chat.add_message(role: :user, content: 'Second message') - allow(chat.instance_variable_get(:@provider)).to receive(:complete) + allow(primary_provider).to receive(:complete) .and_raise(RubyLLM::RateLimitError.new(nil, 'rate limited')) captured_messages = nil @@ -138,7 +162,7 @@ it 'works with streaming' do chat.with_fallback('claude-haiku-4-5-20251001') - allow(chat.instance_variable_get(:@provider)).to receive(:complete) + allow(primary_provider).to receive(:complete) .and_raise(RubyLLM::ServiceUnavailableError.new(nil, 'unavailable')) allow(fallback_provider).to receive(:complete) do |_messages, **_kwargs, &block| @@ -156,7 +180,7 @@ end it 'does not fallback when no fallback is configured' do - allow(chat.instance_variable_get(:@provider)).to receive(:complete) + allow(primary_provider).to receive(:complete) .and_raise(RubyLLM::ServiceUnavailableError.new(nil, 'unavailable')) chat.add_message(role: :user, content: 'Hello') @@ -173,7 +197,7 @@ it "triggers fallback on #{error_class.name.split('::').last}" do chat.with_fallback('claude-haiku-4-5-20251001') - allow(chat.instance_variable_get(:@provider)).to receive(:complete) + allow(primary_provider).to receive(:complete) .and_raise(error_class.new(nil, 'error')) allow(fallback_provider).to receive(:complete) .and_return(RubyLLM::Message.new(role: :assistant, content: 'ok')) @@ -184,5 +208,66 @@ expect(response.content).to eq('ok') end end + + it 'triggers fallback on Faraday::TimeoutError' do + chat.with_fallback('claude-haiku-4-5-20251001') + + allow(primary_provider).to receive(:complete) + .and_raise(Faraday::TimeoutError.new('request timed out')) + allow(fallback_provider).to receive(:complete) + .and_return(RubyLLM::Message.new(role: :assistant, content: 'ok from fallback')) + + chat.add_message(role: :user, content: 'Hello') + response = chat.complete + + expect(response.content).to eq('ok from fallback') + end + + it 'triggers fallback on Faraday::ConnectionFailed' do + chat.with_fallback('claude-haiku-4-5-20251001') + + allow(primary_provider).to receive(:complete) + .and_raise(Faraday::ConnectionFailed.new('connection refused')) + allow(fallback_provider).to receive(:complete) + .and_return(RubyLLM::Message.new(role: :assistant, content: 'ok from fallback')) + + chat.add_message(role: :user, content: 'Hello') + response = chat.complete + + expect(response.content).to eq('ok from fallback') + end + + it 'logs warning with fallback error details when both primary and fallback fail' do + chat.with_fallback('claude-haiku-4-5-20251001') + + allow(primary_provider).to receive(:complete) + .and_raise(RubyLLM::ServiceUnavailableError.new(nil, 'primary down')) + allow(fallback_provider).to receive(:complete) + .and_raise(RubyLLM::ServerError.new(nil, 'fallback also down')) + + chat.add_message(role: :user, content: 'Hello') + + expect(RubyLLM.logger).to receive(:warn).with(/falling back to/) + expect(RubyLLM.logger).to receive(:warn).with(/Fallback to claude-haiku-4-5-20251001 also failed: RubyLLM::ServerError - fallback also down/) + + expect { chat.complete }.to raise_error(RubyLLM::ServiceUnavailableError, 'primary down') + end + + it 'sanitizes model IDs with control characters in log output' do + chat.with_fallback('claude-haiku-4-5-20251001') + + allow(chat.model).to receive(:id).and_return("gpt-4\nnewline-injected") + + allow(primary_provider).to receive(:complete) + .and_raise(RubyLLM::ServiceUnavailableError.new(nil, 'unavailable')) + allow(fallback_provider).to receive(:complete) + .and_return(RubyLLM::Message.new(role: :assistant, content: 'ok')) + + chat.add_message(role: :user, content: 'Hello') + + expect(RubyLLM.logger).to receive(:warn).with(/gpt-4newline-injected/).and_call_original + + chat.complete + end end end From fd5cf17211f92da4d80464cf24b0617a3bb727d6 Mon Sep 17 00:00:00 2001 From: Kieran Klaassen Date: Wed, 18 Feb 2026 22:30:32 -0800 Subject: [PATCH 4/4] refactor: Extract Fallback module from Chat Move fallback logic into RubyLLM::Fallback module, following the same pattern as Streaming for cross-cutting concerns. Chat#complete shrinks from 24 lines of nested rescue to a single delegation. Co-Authored-By: Claude Opus 4.6 --- lib/ruby_llm/active_record/chat_methods.rb | 2 +- lib/ruby_llm/chat.rb | 107 +++++++-------------- lib/ruby_llm/fallback.rb | 67 +++++++++++++ 3 files changed, 101 insertions(+), 75 deletions(-) create mode 100644 lib/ruby_llm/fallback.rb diff --git a/lib/ruby_llm/active_record/chat_methods.rb b/lib/ruby_llm/active_record/chat_methods.rb index cd1871ee7..803ee4876 100644 --- a/lib/ruby_llm/active_record/chat_methods.rb +++ b/lib/ruby_llm/active_record/chat_methods.rb @@ -206,7 +206,7 @@ def ask(message, with: nil, &) def complete(...) to_llm.complete(...) - rescue RubyLLM::Error, Faraday::TimeoutError, Faraday::ConnectionFailed => e + rescue *RubyLLM::Fallback::ERRORS => e cleanup_failed_messages if @message&.persisted? && @message.content.blank? cleanup_orphaned_tool_results raise e diff --git a/lib/ruby_llm/chat.rb b/lib/ruby_llm/chat.rb index 95ba911a3..4a4a68544 100644 --- a/lib/ruby_llm/chat.rb +++ b/lib/ruby_llm/chat.rb @@ -4,18 +4,10 @@ module RubyLLM # Represents a conversation with an AI model class Chat include Enumerable + include Fallback attr_reader :model, :messages, :tools, :params, :headers, :schema - FALLBACK_ERRORS = [ - RateLimitError, - ServerError, - ServiceUnavailableError, - OverloadedError, - Faraday::TimeoutError, - Faraday::ConnectionFailed - ].freeze - def initialize(model: nil, provider: nil, assume_model_exists: false, context: nil) if assume_model_exists && !provider raise ArgumentError, 'Provider must be specified if assume_model_exists is true' @@ -73,11 +65,6 @@ def with_tools(*tools, replace: false) self end - def with_fallback(model_id, provider: nil) - @fallback = { model: model_id, provider: provider } - self - end - def with_model(model_id, provider: nil, assume_exists: false) @model, @provider = Models.resolve(model_id, provider:, assume_exists:, config: @config) @connection = @provider.connection @@ -150,29 +137,38 @@ def each(&) messages.each(&) end - def complete(&) - call_provider(&) - rescue *FALLBACK_ERRORS => e - raise unless @fallback && !@in_fallback - - original_model = @model - original_provider = @provider - original_connection = @connection - - RubyLLM.logger.warn "RubyLLM: #{e.class} on #{sanitize_for_log(original_model.id)}, falling back to #{sanitize_for_log(@fallback[:model])}" - - begin - @in_fallback = true - with_model(@fallback[:model], provider: @fallback[:provider]) - call_provider(&) - rescue *FALLBACK_ERRORS => fallback_error - RubyLLM.logger.warn "RubyLLM: Fallback to #{sanitize_for_log(@fallback[:model])} also failed: #{fallback_error.class} - #{sanitize_for_log(fallback_error.message)}" - raise e - ensure - @in_fallback = false - @model = original_model - @provider = original_provider - @connection = original_connection + def complete(&) # rubocop:disable Metrics/PerceivedComplexity + with_fallback_protection do + response = @provider.complete( + messages, + tools: @tools, + temperature: @temperature, + model: @model, + params: @params, + headers: @headers, + schema: @schema, + thinking: @thinking, + &wrap_streaming_block(&) + ) + + @on[:new_message]&.call unless block_given? + + if @schema && response.content.is_a?(String) + begin + response.content = JSON.parse(response.content) + rescue JSON::ParserError + # If parsing fails, keep content as string + end + end + + add_message response + @on[:end_message]&.call(response) + + if response.tool_call? + handle_tool_calls(response, &) + else + response + end end end @@ -192,39 +188,6 @@ def instance_variables private - def call_provider(&) # rubocop:disable Metrics/PerceivedComplexity - response = @provider.complete( - messages, - tools: @tools, - temperature: @temperature, - model: @model, - params: @params, - headers: @headers, - schema: @schema, - thinking: @thinking, - &wrap_streaming_block(&) - ) - - @on[:new_message]&.call unless block_given? - - if @schema && response.content.is_a?(String) - begin - response.content = JSON.parse(response.content) - rescue JSON::ParserError - # If parsing fails, keep content as string - end - end - - add_message response - @on[:end_message]&.call(response) - - if response.tool_call? - handle_tool_calls(response, &) - else - response - end - end - def wrap_streaming_block(&block) return nil unless block_given? @@ -288,9 +251,5 @@ def replace_system_instruction(instructions) @messages = system_messages + non_system_messages end - - def sanitize_for_log(value) - value.to_s.gsub(/[\x00-\x1f\x7f]/, '') - end end end diff --git a/lib/ruby_llm/fallback.rb b/lib/ruby_llm/fallback.rb new file mode 100644 index 000000000..30e2c2c7a --- /dev/null +++ b/lib/ruby_llm/fallback.rb @@ -0,0 +1,67 @@ +# frozen_string_literal: true + +module RubyLLM + # Handles model-level failover for transient errors. + # Included by Chat to keep fallback logic out of the main conversation flow. + module Fallback + ERRORS = [ + RateLimitError, + ServerError, + ServiceUnavailableError, + OverloadedError, + Faraday::TimeoutError, + Faraday::ConnectionFailed + ].freeze + + def with_fallback(model_id, provider: nil) + @fallback = { model: model_id, provider: provider } + self + end + + private + + def with_fallback_protection(&) + yield + rescue *ERRORS => e + attempt_fallback(e, &) + end + + def attempt_fallback(error, &) + raise error unless @fallback && !@in_fallback + + log_fallback(error) + + original_model = @model + original_provider = @provider + original_connection = @connection + + begin + @in_fallback = true + with_model(@fallback[:model], provider: @fallback[:provider]) + yield + rescue *ERRORS => fallback_error + log_fallback_failure(fallback_error) + raise error + ensure + @in_fallback = false + @model = original_model + @provider = original_provider + @connection = original_connection + end + end + + def log_fallback(error) + RubyLLM.logger.warn "RubyLLM: #{error.class} on #{sanitize_for_log(@model.id)}, " \ + "falling back to #{sanitize_for_log(@fallback[:model])}" + end + + def log_fallback_failure(error) + RubyLLM.logger.warn "RubyLLM: Fallback to #{sanitize_for_log(@fallback[:model])} also failed: " \ + "#{error.class} - #{sanitize_for_log(error.message)}" + end + + def sanitize_for_log(value) + value.to_s.gsub(/[\x00-\x1f\x7f]/, '') + end + end +end