Skip to content

Structured Output & JSON mode response support #131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ build-iPhoneSimulator/
# for a library or gem, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
Gemfile.lock
# .ruby-version
# .ruby-gemset
.ruby-version
.ruby-gemset

# unless supporting rvm < 1.11.0 or doing something fancy, ignore this:
.rvmrc
Expand All @@ -57,3 +57,4 @@ Gemfile.lock
# .rubocop-https?--*

repomix-output.*
/.idea/
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ group :development do
gem 'nokogiri'
gem 'overcommit', '>= 0.66'
gem 'pry', '>= 0.14'
gem 'pry-byebug', '>= 3.11'
gem 'rake', '>= 13.0'
gem 'rdoc'
gem 'reline'
Expand Down
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ chat.ask "Tell me a story about a Ruby programmer" do |chunk|
print chunk.content
end

# Get structured responses easily (OpenAI only for now)
chat.with_response_format(:integer).ask("What is 2 + 2?").to_i # => 4

# Generate images
RubyLLM.paint "a sunset over mountains in watercolor style"

Expand Down
50 changes: 49 additions & 1 deletion docs/guides/chat.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,54 @@ end
chat.ask "What is metaprogramming in Ruby?"
```

## Receiving Structured Responses
You can ensure the responses follow a schema you define like this:
```ruby
chat = RubyLLM.chat

chat.with_response_format(:integer).ask("What is 2 + 2?").to_i
# => 4

chat.with_response_format(:string).ask("Say 'Hello World' and nothing else.").content
# => "Hello World"

chat.with_response_format(:array, items: { type: :string })
chat.ask('What are the 2 largest countries? Only respond with country names.').content
# => ["Russia", "Canada"]

chat.with_response_format(:object, properties: { age: { type: :integer } })
chat.ask('Provide sample customer age between 10 and 100.').content
# => { "age" => 42 }

chat.with_response_format(
:object,
properties: { hobbies: { type: :array, items: { type: :string, enum: %w[Soccer Golf Hockey] } } }
)
chat.ask('Provide at least 1 hobby.').content
# => { "hobbies" => ["Soccer"] }
```

You can also provide the JSON schema you want directly to the method like this:
```ruby
chat.with_response_format(type: :object, properties: { age: { type: :integer } })
# => { "age" => 31 }
```

In this example the code is automatically switching to OpenAI's json_mode since no object properties are requested:
```ruby
chat.with_response_format(:json) # Don't care about structure, just give me JSON

chat.ask('Provide a sample customer data object with name and email keys.').content
# => { "name" => "Tobias", "email" => "[email protected]" }

chat.ask('Provide a sample customer data object with name and email keys.').content
# => { "first_name" => "Michael", "email_address" => "[email protected]" }
```

{: .note }
**Only OpenAI supported for now:** Only OpenAI models support this feature for now. We will add support for other models shortly.


## Next Steps

This guide covered the core `Chat` interface. Now you might want to explore:
Expand All @@ -269,4 +317,4 @@ This guide covered the core `Chat` interface. Now you might want to explore:
* [Using Tools]({% link guides/tools.md %}): Enable the AI to call your Ruby code.
* [Streaming Responses]({% link guides/streaming.md %}): Get real-time feedback from the AI.
* [Rails Integration]({% link guides/rails.md %}): Persist your chat conversations easily.
* [Error Handling]({% link guides/error-handling.md %}): Build robust applications that handle API issues.
* [Error Handling]({% link guides/error-handling.md %}): Build robust applications that handle API issues.
27 changes: 19 additions & 8 deletions lib/ruby_llm/active_record/acts_as.rb
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,12 @@ def with_instructions(instructions, replace: false)
self
end

# @see LlmChat#with_response_format
def with_response_format(...)
to_llm.with_response_format(...)
self
end

def with_tool(...)
to_llm.with_tool(...)
self
Expand Down Expand Up @@ -158,14 +164,19 @@ def persist_message_completion(message) # rubocop:disable Metrics/AbcSize,Metric
end

transaction do
@message.update!(
role: message.role,
content: message.content,
model_id: message.model_id,
tool_call_id: tool_call_id,
input_tokens: message.input_tokens,
output_tokens: message.output_tokens
)
# These are required fields:
@message.role = message.role
@message.content = message.content

# These are optional fields:
@message.try('model_id=', message.model_id)
@message.try('tool_call_id=', tool_call_id)
@message.try('input_tokens=', message.input_tokens)
@message.try('output_tokens=', message.output_tokens)
@message.try('content_schema=', message.content_schema)

@message.save!

persist_tool_calls(message.tool_calls) if message.tool_calls.present?
end
end
Expand Down
72 changes: 64 additions & 8 deletions lib/ruby_llm/chat.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,56 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n
}
end

##
# This method lets you ensure the responses follow a schema you define like this:
#
# chat.with_response_format(:integer).ask("What is 2 + 2?").to_i
# # => 4
# chat.with_response_format(:string).ask("Say 'Hello World' and nothing else.").content
# # => "Hello World"
# chat.with_response_format(:array, items: { type: :string })
# chat.ask('What are the 2 largest countries? Only respond with country names.').content
# # => ["Russia", "Canada"]
# chat.with_response_format(:object, properties: { age: { type: :integer } })
# chat.ask('Provide sample customer age between 10 and 100.').content
# # => { "age" => 42 }
# chat.with_response_format(
# :object,
# properties: { hobbies: { type: :array, items: { type: :string, enum: %w[Soccer Golf Hockey] } } }
# )
# chat.ask('Provide at least 1 hobby.').content
# # => { "hobbies" => ["Soccer"] }
#
# You can also provide the JSON schema you want directly to the method like this:
# chat.with_response_format(type: :object, properties: { age: { type: :integer } })
# # => { "age" => 31 }
Comment on lines +44 to +56
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayelkaake @crmne Thanks for all the work put in this! I've been trying the API and found it to be a bit surprising and adds complexity on top of OpenAI nuances.

Comparing the calls

  • chat.with_response_format(:object, properties: { age: { type: :integer } })
  • chat.with_response_format(type: :object, properties: { age: { type: :integer } })
    The inclusion or not of the type: changes a lot the way we invoke OpenAI. One relies on json_mode and the other on json_schema but it's not very clear from the API.

In addition, the support of structured output or json_mode depends also on the model used. Old models will not support json_schema so maybe that's something we want to factor in? I.e. use one or the other based on the model.

OpenAI doesn't recommend JSON mode except for older models; and I do believe its API a product of its time. The fact that we have to append more instructions to the original prompt and ask for json is a sign of that (why do it in english btw?), so I don't think RubyLLM should default to it relegating the response format json_schema.

#
# In this example the code is automatically switching to OpenAI's json_mode since no object
# properties are requested:
# chat.with_response_format(:json) # Don't care about structure, just give me JSON
# chat.ask('Provide a sample customer data object with name and email keys.').content
# # => { "name" => "Tobias", "email" => "[email protected]" }
# chat.ask('Provide a sample customer data object with name and email keys.').content
# # => { "first_name" => "Michael", "email_address" => "[email protected]" }
#
# @param type [Symbol] (optional) This can be anything supported by the API JSON schema types (integer, object, etc)
# @param schema [Hash] The schema for the response format. It can be a JSON schema or a simple hash.
# @return [Chat] (self)
def with_response_format(type = nil, **schema)
schema_hash = if type.is_a?(Symbol) || type.is_a?(String)
{ type: type == :json ? :object : type }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsif type.is_a?(Hash)
type
else
{}
end.merge(schema)

@response_schema = Schema.new(schema_hash)

self
end
alias with_structured_response with_response_format

def ask(message = nil, with: {}, &)
add_message role: :user, content: Content.new(message, with)
complete(&)
Expand Down Expand Up @@ -86,17 +136,23 @@ def each(&)

def complete(&) # rubocop:disable Metrics/MethodLength
@on[:new_message]&.call
response = @provider.complete(
messages,
tools: @tools,
temperature: @temperature,
model: @model.id,
connection: @connection,
&
)
response = @provider.with_response_schema(@response_schema) do
Copy link
Author

@jayelkaake jayelkaake Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crmne should we reset the @response_schema after the completion so it doesn't apply to subsequent messages in the chat?

Learning from your comment on my other PR about temperature I realize now that your with_* pattern is meant to only apply to the next completion, so that's why I'm asking.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, with_* applies to all consequent messages

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I actually updated the code to reset the response schema after completion after your comment on my other PR about temperature already!

I've been battle-testing this code at my company Osello (which is also a sponsor now of this project) and I realized it does make more sense to reset the response_schema after completion because in practice subsequent chat messages are most likely not meant to follow the same format:

chat.with_response_format(type: :string, enum: %w["Toronto", "Ottawa"])
       .ask("What's the capital of Canada?")
       .content
# => "Ottawa"

chat.ask("How long has it been the capital?")
# => "Ottawa has been the capital of Canada since 1857."

chat.with_response_format(type: :integer).ask("How many years is that?")
# => 168

Copy link

@sirwolfgang sirwolfgang Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the the design of the with_* prefix I think it would be best to not break the applies to all interface. What about splitting it to a different prefix?

# Applies to all messages
chat = RubyLLM.chat.with_response_format(type: :string)

chat.ask("What's the capital of Canada?")
# => "Ottawa"

# Applies to current message
chat.as(type: :integer).ask("How many years is that?")
# => 168

# Resets back
chat.ask("How long has it been the capital?")
# => "Ottawa has been the capital of Canada since 1857."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sirwolfgang good idea!

Although as might be a bit too ambiguous. Maybe .with_next_response_in_format(...) or something like that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea is that I could make it so that with_response_format takes a block, and that can be used to reset format after:

chat.with_response_format(type: :string, enum: %w["Toronto", "Ottawa"]) do
  chat.ask("What's the capital of Canada?").content
end
# => "Ottawa"

chat.ask("How long has it been the capital?")
# => "Ottawa has been the capital of Canada since 1857."

chat.with_response_format(type: :integer)
chat.ask("How many years ago is that?")
# => 168

chat.ask("How many years ago will that be next year?")
# => 169

Copy link

@sirwolfgang sirwolfgang Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayelkaake I think linguistically/ergonomically it would be better to split and not use the .with prefix at all, making it a shorter scan/parse.

Totally open to ideas other than .as, but also curious to what you think it might collide with? I don't think it's structurally more ambiguous than .with. The only other as convention I can think of is rails routes, so I also don't think it's mnemonically overloaded.

Could extend it to .as_format, .as_type, .as_response_type or something else, if we want to preserve the root namespace.

Otherwise, if we could delay execution like that of AR, I can see the argument for making it a post setting; something like:

agent.ask("...?").in(type: :integer)
agent.ask("...?").as(type: :integer)
agent.ask("...?").structured_as(type: :integer)
agent.ask("...?").formated_as(type: :integer)

Copy link
Author

@jayelkaake jayelkaake Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, just doesn't read well with English "ask someone to do something as...".... and also it should be clear you're not modifying the query like it is with AR, it's modifying the response.

I like the postfix format better. Maybe something like agent.ask("...?").response_as(:integer) but I think that might require some major updates to this library to get there.

Most APIs don't let you mutate the response format in real-time, so LLMs are kind of introducing the need for a new pattern maybe? I've been scratching my head about these things a lot over the last couple weeks! 😅

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think refactoring to support a more dynamic method chaining should be a different PR; But we could setup the expect syntax here, and build towards that; Since it should functionally work in either order.
agent.<token>.ask("...?") => agent.ask("...?").<token>

response/d feels a little weird to me. I could see this interface also make sense for loading personas; like respond_as(:support_agent). Which might be the process of chaining assistants for processing. Like

timekeeper = RubyLLM.chat.with_tool(TIME)
groot = RubyLLM.chat.with_instructions(GROOT)

timekeeper.ask("What time is it?").respond_as(groot) => "I am grooooooot"

@provider.complete(
messages,
tools: @tools,
temperature: @temperature,
model: @model.id,
connection: @connection,
&
)
end

@on[:end_message]&.call(response)

add_message response

@response_schema = nil # Reset the response schema after completion of this chat thread

if response.tool_call?
handle_tool_calls(response, &)
else
Expand Down
28 changes: 27 additions & 1 deletion lib/ruby_llm/message.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ module RubyLLM
class Message
ROLES = %i[system user assistant tool].freeze

attr_reader :role, :content, :tool_calls, :tool_call_id, :input_tokens, :output_tokens, :model_id
attr_reader :role, :tool_calls, :tool_call_id, :input_tokens, :output_tokens, :model_id, :content_schema

delegate :to_i, :to_a, :to_s, to: :content

def initialize(options = {})
@role = options[:role].to_sym
Expand All @@ -17,10 +19,22 @@ def initialize(options = {})
@output_tokens = options[:output_tokens]
@model_id = options[:model_id]
@tool_call_id = options[:tool_call_id]
@content_schema = options[:content_schema]

ensure_valid_role
end

def content
return @content unless @content_schema.present?
return @content if @content.nil?

if @content_schema[:type].to_s == :object.to_s && @content_schema[:properties].to_h.keys.none?
json_response
else
structured_content
end
end

def tool_call?
!tool_calls.nil? && !tool_calls.empty?
end
Expand All @@ -47,6 +61,18 @@ def to_h

private

def json_response
return nil if @content.nil?

JSON.parse(@content)
end

def structured_content
return nil if @content.nil?

json_response['result']
end

def normalize_content(content)
case content
when Content then content.format
Expand Down
23 changes: 23 additions & 0 deletions lib/ruby_llm/provider.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,29 @@ def list_models(connection:)
parse_list_models_response response, slug, capabilities
end

##
# @return [::RubyLLM::Schema, NilClass]
def response_schema
Thread.current['RubyLLM::Provider::Methods.response_schema']
end

##
# @param response_schema [::RubyLLM::Schema]
def with_response_schema(response_schema)
prev_response_schema = Thread.current['RubyLLM::Provider::Methods.response_schema']

result = nil
begin
Thread.current['RubyLLM::Provider::Methods.response_schema'] = response_schema

result = yield
ensure
Thread.current['RubyLLM::Provider::Methods.response_schema'] = prev_response_schema
end

result
end

Comment on lines +34 to +56
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's all this about threads? we have contexts now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a thread-safe way to set response schema.

I'll be able to refactor it to use context instead (assuming the context system thread-safe, haven't looked yet).

def embed(text, model:, connection:, dimensions:)
payload = render_embedding_payload(text, model:, dimensions:)
response = connection.post(embedding_url(model:), payload)
Expand Down
54 changes: 53 additions & 1 deletion lib/ruby_llm/providers/openai/chat.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,14 @@ def render_payload(messages, tools:, temperature:, model:, stream: false) # rubo
payload[:tools] = tools.map { |_, tool| tool_for(tool) }
payload[:tool_choice] = 'auto'
end

add_response_schema_to_payload(payload) if response_schema.present?

payload[:stream_options] = { include_usage: true } if stream
end
end

def parse_completion_response(response) # rubocop:disable Metrics/MethodLength
def parse_completion_response(response) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize -- ABC is high because of the JSON parsing which is better in 1 method
data = response.body
return if data.empty?

Expand All @@ -37,6 +40,7 @@ def parse_completion_response(response) # rubocop:disable Metrics/MethodLength

Message.new(
role: :assistant,
content_schema: response_schema,
content: message_data['content'],
tool_calls: parse_tool_calls(message_data['tool_calls']),
input_tokens: data['usage']['prompt_tokens'],
Expand Down Expand Up @@ -64,6 +68,54 @@ def format_role(role)
role.to_s
end
end

private

##
# @param [Hash] payload
def add_response_schema_to_payload(payload)
payload[:response_format] = gen_response_format_request

return unless payload[:response_format][:type] == :json_object

# NOTE: this is required by the Open AI API when requesting arbitrary JSON.
payload[:messages].unshift({ role: :developer, content: <<~GUIDANCE
You must format your output as a valid JSON object.
Format your entire response as valid JSON.
Do not include explanations, markdown formatting, or any text outside the JSON.
GUIDANCE
})
end

##
# @return [Hash]
def gen_response_format_request
if response_schema[:type].to_s == :object.to_s && response_schema[:properties].to_h.keys.none?
{ type: :json_object } # Assume we just want json_mode
else
gen_json_schema_format_request
end
end

def gen_json_schema_format_request # rubocop:disable Metrics/MethodLength -- because it's mostly the standard hash
result_schema = response_schema.dup # so we don't modify the original in the thread
result_schema.add_to_each_object_type!(:additionalProperties, false)
result_schema.add_to_each_object_type!(:required, ->(schema) { schema[:properties].to_h.keys })

{
type: :json_schema,
json_schema: {
name: :response,
schema: {
type: :object,
properties: { result: result_schema.to_h },
additionalProperties: false,
required: [:result]
},
strict: true
}
}
end
end
end
end
Expand Down
Loading