Performance in Ruby is won at design time, not in micro-optimization. Know the resource hierarchy, write code YJIT compiles well, and let the profiler name the hot path before you touch it.
# frozen_string_literal: true
# typed: strict
# before: p99 order-summary latency 38ms; stackprof blamed serial DB round-trips + GC pressure.
sig { params(order_ids: T::Array[Integer]).returns(T::Array[OrderSummary]) }
def summarize_orders(order_ids)
# One bulk query instead of N per-row fetches (15.11); symbols for label keys (15.12).
rows = Order.where(id: order_ids).includes(:line_items).to_a
rows.map do |order|
total = order.line_items.reduce(Money::ZERO) { |sum, li| sum + li.subtotal }
OrderSummary.new(
id: order.id,
customer: order.customer_id,
total: total,
item_count: order.line_items.size, # size, not count (15.8)
status: order.status,
)
end
end
# after: p99 38ms -> 4ms; GC time down 6x; one query replaces N+1.
sig { params(skus: T::Array[String]).returns(String) }
def sku_report(skus)
buf = +"" # mutable string; << mutates in place (15.6)
skus.each { |sku| buf << sku << "\n" }
buf.freeze
endsummarize_orders wins by eliminating the N+1 fetch in one includes call — a network/disk win that dwarfs any CPU tuning (15.2). size reads the already-loaded association length without re-querying (15.8). sku_report builds with << so no intermediate String is allocated on each iteration (15.6). Both examples obey object-shape discipline: OrderSummary.new is called with the same keyword arguments in the same order every time, so YJIT's inline caches see one shape (15.4). The measurement note in the comment is the ledger entry (15.1) that earned these deviations from the straightforward form.
Reasoning, step by step:
- Intuition about Ruby performance is wrong at least half the time. YJIT, the GC, object shapes, and the C-extension boundary interact non-locally. A hypothesis is not a result.
- For CPU:
stackprofwith wall-clock or CPU mode names the frame that actually burns time. For memory:memory_profilerreports allocation counts and sources. For micro-comparisons:benchmark-ips(iterations-per-second) measures a function in isolation with a warm runtime. Use all three: they answer different questions. - Capture the profile before changing any code, apply the fix, capture again. The delta is the only proof. Store a benchmark beside the code it guards — a committed
bench/file turns "we made this faster once" into a regression guard the next refactor must beat. - Never touch a cold path. The GC's generational minor collection makes short-lived allocation cheap in code the profiler never flags; clean code there beats a saved allocation everywhere the flamegraph is flat.
Enforcement: A PR claiming a performance win attaches before/after numbers (stackprof flamegraph, benchmark-ips output, or memory_profiler totals) in the description. Optimization without a committed profile or benchmark does not merge.
Reasoning, step by step:
- Resource costs differ by orders of magnitude: a network round-trip at 5–50ms dwarfs a CPU operation at nanoseconds. One eliminated query beats a thousand micro-optimizations. Choose the layer to work on from the profile, not from the code.
- The order is fixed: network > disk > memory > CPU. If the profiler shows DB time, fix the query or add a batch. If it shows GC, reduce allocations. Only reach for CPU tricks when the design layer is sound and CPU is genuinely the constraint.
- This is root rule 11 restated precisely. Architecture is chosen once, at design time, and is expensive to retrofit; get the resource hierarchy right first.
Enforcement: Design review against the four-resource hierarchy. A CPU micro-fix proposed before a profile names CPU as the bottleneck is sent back; fix the slowest resource named in the profile.
Reasoning, step by step:
- YJIT is Ruby's stable production compiler; enable it with
--yjitorRUBY_YJIT_ENABLE=1. Ruby 4.0 also ships ZJIT, YJIT's successor, compiled into the binary but not enabled at runtime by default — evaluate it, but keep YJIT as the default until ZJIT matches it. A JIT compiles call sites monomorphically: a site that always dispatches to the same method with the same argument types becomes near-native code, while a megamorphic site falls back to interpretation. - The JIT rewards predictable, simple call sites. Avoid heavy
method_missing,sendwith dynamic names, anddefine_methodin hot paths — each defeats the inline cache and forces a generic dispatch. - Benchmark with the JIT you ship enabled. A micro-benchmark run with
--disable-yjitmeasures a different runtime; the numbers do not transfer. Profile in the environment you ship.
Enforcement: Benchmarks capture the ruby --yjit baseline. Reviewers flag send/method_missing on measured hot paths and redirect to a direct call or a dispatch table.
Reasoning, step by step:
- Ruby 3.2+ tracks object shapes: the combination of an object's ivar names and their assignment order. YJIT caches ivar accesses by shape. Two objects of the same class that assigned ivars in different orders, or assigned different subsets, get different shapes; YJIT's cache misses and the access degrades to a hash lookup.
- Assign every ivar your class will ever use inside
initialize, in one deterministic order. Never conditionally define an ivar outsideinitialize; that forks the shape into a transition chain. - This mirrors chapter 03's
T.letordering discipline: the Sorbet annotation and the shape-stability rule agree — declare all state upfront, in order, always.
Worked example:
# frozen_string_literal: true
# typed: strict
class Inventory
sig { params(sku: Sku, quantity: Integer, reserved: Integer).void }
def initialize(sku, quantity, reserved)
@sku = T.let(sku, Sku) # consistent order: same shape every construction
@quantity = T.let(quantity, Integer)
@reserved = T.let(reserved, Integer)
# never: if condition then @reserved = ... end — that forks the shape
end
endEnforcement: RuboCop and Sorbet # typed: strict flag uninitialized ivars. Reviewers reject ivar assignments outside initialize on objects in hot paths.
Reasoning, step by step:
# frozen_string_literal: trueat the top of every file interns string literals: the same literal produces the same object reference instead of a fresh allocation each time. This is mandatory per chapter 01, and its performance benefit is free.- The GC cost is the rate, not the instance. One allocation is trivially cheap; the same allocation inside the inner loop of a hot path runs millions of times per second and accumulates pause time. Hoist allocations out of loops; build output into a pre-allocated buffer; reuse rather than reallocate.
- Mutable working strings created with
+""(an unfrozen duplicate of an empty literal) orString.newserve as reusable buffers. Freeze the final result before returning it across a boundary.
Enforcement: frozen_string_literal: true is enforced by rubocop (Style/FrozenStringLiteralComment). Reviewers flag allocation-per-iteration patterns on measured hot paths.
Reasoning, step by step:
String#+allocates a newStringobject on every call. Inside a loop, each iteration discards the previous string and hands the GC a new object to collect. For N iterations the cost is O(N) allocations and up to O(N²) bytes copied.String#<<mutates the receiver in place: no new object, no copy of the accumulated result, O(1) per append. Use it whenever a string is being assembled incrementally.- When the string must be immutable at the boundary, freeze it after assembly — do not use
String#+to avoid mutability. Build with<<into a mutable buffer, returnbuf.freeze.
Worked example:
# frozen_string_literal: true
# typed: strict
sig { params(items: T::Array[LineItem]).returns(String) }
def csv_lines(items)
buf = +""
items.each do |item|
buf << item.sku.to_s << "," << item.quantity.to_s << "\n"
end
buf.freeze
endEnforcement: Review; rubocop Performance/StringConcatenationInLoop flags String#+ inside a block or loop body.
Reasoning, step by step:
- Chained
Enumerablemethods (map,select,find) are eager by default: each stage materializes a full intermediate array. For large or infinite sequences this is both a memory and a latency problem — you pay for the whole collection before consuming the first element. - Prepend
.lazyto the chain. Lazy enumerators pull one element at a time through every stage; no intermediate array is built. Call.first(n)or.take(n).to_aat the end to force evaluation of only the elements you need. - Use
lazywhen: the source is a large collection, anIO/Enumerator, or infinite; or when you consume only a prefix. Do not use it on small, finite collections where eager is clearer and the GC pressure is negligible.
Worked example:
# frozen_string_literal: true
# typed: strict
sig { params(inventory: T::Enumerable[Inventory], limit: Integer).returns(T::Array[Sku]) }
def low_stock_skus(inventory, limit)
inventory
.lazy
.select { |inv| inv.quantity < inv.reorder_threshold }
.map(&:sku)
.first(limit)
endEnforcement: Review; rubocop Performance/Count and similar cops flag materialized chains where lazy would avoid a full traversal.
Reasoning, step by step:
- On Ruby
Array,Hash, andString,sizeandlengthare aliases: O(1) cached-length reads.countwithout a block is also O(1) onArrayandHash; but onActiveRecord::Relationand manyEnumerablesources,countexecutes aSELECT COUNT(*)or iterates the entire collection. The distinction matters enormously at the data layer. - Prefer
sizeas the default: it signals "read the cached length" and never triggers a query or a full traversal on any standard Ruby type. Reservecount { |e| predicate }for the counted-predicate form, where it is semantically distinct. - When working with ActiveRecord,
sizechecks whether the association is already loaded and returns the cached length if so, falling back to aCOUNTonly when unloaded.countalways issues a query. Load the association once and usesize.
Enforcement: RuboCop Performance/Size (prefer size over count for arrays/strings without a block). Reviewers replace count on already-loaded associations with size.
Reasoning, step by step:
gsubcompiles and executes a regex on every call. When the operation is an exact-string replacement (sub), a single-character translation (tr), a character deletion (delete), or a prefix/suffix check (start_with?/end_with?), the specialized method is faster and communicates intent more precisely.- The performance difference is not noise: regex engine startup and backtracking overhead is real, especially when the pattern could be expressed as a literal. Use the specialized form and save the regex for problems that actually require it.
- Hierarchy:
start_with?/end_with?for prefix/suffix checks;deletefor character-set removal;trfor single-character translation tables;subfor a single exact-string replacement;gsubonly when you need global replacement of a pattern that is genuinely a regex.
Worked example:
# frozen_string_literal: true
# typed: strict
sku.gsub("-", "_") # bad — regex for a literal character translation
sku.tr("-", "_") # good — O(n) single-pass translation
label.gsub(/^Order: /, "") # bad — regex for a literal prefix removal
label.delete_prefix("Order: ") # good — purpose-built, no regex
code.start_with?("SKU-") # good — no regex needed for a prefix checkEnforcement: RuboCop Performance/StringReplacement, Performance/StartWith, Performance/EndWith. Reviewers flag gsub with a string-literal pattern and redirect to the specialized method.
Reasoning, step by step:
hash[key] || defaultrecomputesdefaulton every miss and mishandles falsy stored values: a storedfalseor0triggers the default incorrectly.hash.fetch(key) { default }evaluates its block only on miss and raisesKeyErroron unrecoverable absence, making the miss explicit.Hash.new { |h, k| h[k] = expensive_default(k) }computes and caches the default on first access. The alternative —hash[key] ||= expensive_default(key)— recomputes on every falsy value and allocates the result without caching it into the hash. The factory hash pays the cost once per key.- Memoization via
||=is appropriate for simple, truthy, idempotent defaults on instance variables (chapter 04 rule). For hash values,fetch+ a block or a factory hash is the correct tool and is consistent with the chapter 03Hash#fetchover[]discipline.
Worked example:
# frozen_string_literal: true
# typed: strict
# Grouping line items by SKU — build with a factory hash, not repeated ||= chains.
sig { params(items: T::Array[LineItem]).returns(T::Hash[Sku, T::Array[LineItem]]) }
def group_by_sku(items)
groups = Hash.new { |h, k| h[k] = [] }
items.each { |item| groups[item.sku] << item }
groups
end
# Safe lookup with an explicit miss block:
price = prices.fetch(sku) { Money::ZERO }Enforcement: RuboCop Style/FetchEnvVar (analogous pattern); review for hash[key] || default on hot paths where the value could be falsy or the computation is non-trivial.
Reasoning, step by step:
- A loop that issues one query or HTTP request per element transforms O(1) I/O cost into O(N) I/O cost. At N = 100 rows and 5ms per round-trip, 500ms of latency is manufactured from nothing. This is the single most common performance defect in Ruby applications.
- Collect all ids or keys upfront, issue one bulk query (
WHERE id IN (...)), then index the results in memory for O(1) lookup. For external APIs, batch the request if the API supports it; otherwise fan out with bounded concurrency (chapter 09) rather than serial iteration. - ORMs make N+1 invisible:
order.line_itemsinside anorders.eachloop is a query per order. Use eager loading (includes,preload,eager_load) at the query site; confirm the plan withbulletor SQL logs.
Worked example:
# frozen_string_literal: true
# typed: strict
# bad — one query per order (N+1):
# orders.each { |order| process(order.customer) }
# good — one query for all customers, then index by id:
sig { params(orders: T::Array[Order]).returns(T::Array[Receipt]) }
def fulfill_orders(orders)
customer_ids = orders.map(&:customer_id)
customers = Customer.where(id: customer_ids).index_by(&:id)
orders.map do |order|
customer = customers.fetch(order.customer_id)
Receipt.new(order:, customer:)
end
endEnforcement: bullet gem in development/test raises on N+1 queries. Reviewers reject query-inside-loop patterns. SQL logs in tests catch unintentional N+1 introduced by refactors.
Reasoning, step by step:
- A symbol is interned:
:statusis one object for the lifetime of the process. A string"status"is allocated fresh at every string-literal evaluation (mitigated byfrozen_string_literal: true, which interns string literals — but dynamic strings, string keys from JSON, and string construction still allocate). Symbols remove the allocation entirely. - Hash keys that are symbols allow Ruby to use a faster internal key-lookup path. The shorthand hash syntax (
status:,sku:) and Sorbet'sT::Structfield names are already symbol-keyed; stay consistent and use symbols for all internal keys and option labels. - The boundary exception: external data (JSON, query parameters, environment) arrives as strings. Convert at the boundary (
to_symorsymbolize_keys) and work with symbols internally — or useHash#fetchwith string keys only in the adapter layer. Do not scatterto_symacross the codebase; do it once at the entry point.
Enforcement: RuboCop Performance/InefficientHashSearch and Style/HashSyntax (enforce shorthand symbol keys). Reviewers flag string keys on internal hashes and option sets.
- Mandatory
frozen_string_literal: true,rubocop-airbnbbaseline, and thesrb tctypecheck gate: 01-formatting-and-tooling.md. - Sorbet
T.letordering and ivar initialization discipline that aligns with object-shape stability (15.4): 03-type-safety-and-nil-discipline.md. Hash#fetchover[],||=memoization caveats, and constant freezing: 04-variables-and-declarations.md.- Method size cap, guard clauses, and assertion density — the method-level discipline that keeps hot paths narrow: 05-methods.md.
Data.definevalue objects and one-site construction — the shape-stability strategy (15.4): 06-classes-and-data-modeling.md.map/select/reduce/find, lazy enumerators, and&:sym— the idiomatic pipeline tools this chapter tunes: 07-ruby-idioms.md.- Bounded concurrency, Ractors, and deadline timeouts for bounded fan-out behind rule 15.11: 09-concurrency.md.
- Resource pools, bounded caches, and deterministic teardown for allocation-reuse patterns (15.5): 13-resource-management.md.