Update horizontal scaling

dmitrii-ubskii · dmitrii-ubskii · commit cd6a39ae4855 · 2025-08-05T14:43:28.000+01:00
diff --git a/new_core_concepts/modules/ROOT/pages/typedb/horizontal_scaling.adoc b/new_core_concepts/modules/ROOT/pages/typedb/horizontal_scaling.adoc
@@ -1,66 +1,43 @@
 = Horizontal Scaling
 
-When a single server is insufficient to meet the uptime and reliability requirements of a production application, it becomes necessary to
-scale out. In TypeDB, this is achieved by creating a database cluster that provides high availability and fault tolerance through data
-replication. This chapter explains the architecture and mechanics of a TypeDB cluster, with a focus on how replication enables fault
-tolerance and how transactions are processed across different nodes.
+When a single server can no longer handle the volume of read queries for your application, it becomes necessary to scale out horizontally.
+In TypeDB, this is achieved by creating a database cluster that distributes the read load across multiple nodes. This chapter explains how
+TypeDB's replication-based architecture enables horizontal scaling for read transactions while also providing high availability and fault
+tolerance. 
 
-== Introduction to High Availability
+== Scaling read throughput with replication
 
-For many applications, continuous availability is critical. A single server represents a single point of failure; if that server goes down
-due to a hardware failure or network issue, or is overwhelmed by the volume of requests, the application becomes unavailable. A TypeDB
-cluster mitigates this risk by deploying the database across multiple servers, or nodes.
+TypeDB's strategy for horizontal scaling is based on data replication using the Raft consensus algorithm. Because every node has access to
+all the data, read-only transactions can be executed on any node in the cluster. This allows you to scale your application's read throughput
+linearly by simply adding more nodes. As you add nodes, the cluster's capacity to handle concurrent read queries increases proportionally.
 
-TypeDB achieves high availability and fault tolerance through data replication. Replication ensures that every node in the cluster maintains
-a complete copy of the entire database. This redundancy means that if one node fails, the other nodes can continue to serve requests without
-interruption or data loss, ensuring the database remains online.
+== The leader-follower model
 
-A TypeDB cluster operates on a leader-follower model, which is managed by the RAFT consensus algorithm. This is the core technology that
-makes the cluster fault-tolerant and consistent.
+A TypeDB cluster operates on a leader-follower model. At any given time, the cluster elects a single leader node for each database, while
+all other nodes act as followers. Followers receive a stream of committed transactions from the leader and apply them to their local copy of
+the database, keeping them in sync. These nodes are only available to process read transactions.
 
-- *Leader Node:* at any given time, the cluster elects a single node as the leader _separately for each database_. The leader is exclusively
-  responsible for processing all schema and date writes. This design centralizes writes, which simplifies consistency and eliminates the
-  complexities of distributed transactions that would arise if data were partitioned.
+The leader is exclusively responsible for processing all schema and data writes. Centralizing writes on a single node simplifies consistency
+and ensures that all changes are applied in a strict order. Write throughput is therefore determined by the capacity of the single leader
+node and is scaled by increasing its resources (see xref:{page-version}@new_core_concepts::typedb/vertical_scaling.adoc[vertical scaling]).
 
-- *Follower Nodes:* all other nodes in the cluster act as followers. They passively receive a stream of committed transactions from the
-  leader's log and apply them to their own local copy of the database. This keeps them in sync with the leader.
+If a leader node fails, the cluster automatically elects a new leader from among the followers, ensuring that the database remains available
+for writes with minimal interruption.
 
-- *Leader Election:* if the leader node fails or becomes unreachable, the RAFT algorithm automatically initiates a new election among the
-  remaining follower nodes. A new leader is chosen from the followers that have the most up-to-date log, and the cluster can resume write
-  operations with minimal downtime, typically within seconds.
-
-Since all write operations must go through a single leader, the write throughput of the cluster is equivalent to the write throughput of a
-single node. To scale write performance, you must scale the leader node vertically (i.e., provide it with more powerful hardware).
-
-Read performance, however, can be scaled horizontally. Because every node in the cluster holds a complete copy of the data, read-only
-transactions can be executed on any node, whether it's the leader or a follower. By directing read queries to follower nodes, you can
-distribute the read load across the entire cluster. This allows the system to handle a much higher volume of concurrent read requests than a
-single server could, significantly improving read scalability.
-
-== Interacting with a Cluster
+== Interacting with a cluster
 
 Interacting with a cluster is very similar to interacting with a single server. The key difference is that the client driver must be
 configured with the network addresses of all nodes in the cluster.
 
-The driver uses this list to intelligently manage connections. It automatically discovers which node is the current leader for the database
-and routes all write transactions to it. For read transactions, the driver can be configured to distribute the load across all available
-nodes (both leader and followers), effectively using the entire cluster's capacity for reads. This routing is handled internally, so your
-application code for opening sessions and running transactions remains the same whether you are connecting to a single node or a full
-cluster.
-
-== Consistency and Durability in a Cluster
-
-TypeDB's replication model, managed by RAFT, provides strong consistency guarantees. When a client sends a write transaction to the leader,
-the following steps ensure its durability and consistency:
-
-- The leader appends the transaction to its internal, on-disk log.
-
-- The leader sends this new log entry to all follower nodes.
-
-- The leader waits until a quorum (a majority of the nodes in the cluster, including itself) has acknowledged that they have successfully
-  written the entry to their own logs.
+The driver uses this list to intelligently manage connections. It automatically discovers which node is the current leader for a given
+database and routes all write transactions to it. For read transactions, the driver can distribute the load across all available nodes (both
+leader and followers) by setting the `read_any_replica` option during opening the transaction, effectively using the entire cluster's
+capacity. This routing is handled transparently, so your application code for opening sessions and running transactions remains the same
+whether you are connecting to a single node or a full cluster.
 
-- Only after reaching this quorum does the leader apply the transaction to its state machine and confirm the commit to the client.
+== Consistency and durability
 
-This process guarantees that once a transaction is committed, it is durably stored on a majority of the cluster's nodes and will survive the
-failure of any minority of nodes. This ensures that the database remains in a consistent state, and no committed data is ever lost.
+TypeDB's replication model provides strong consistency guarantees, even in a distributed read environment. When a write transaction is sent
+to the leader, it is not confirmed until a majority of nodes in the cluster have durably stored the transaction in their logs. This process
+guarantees that once a transaction is committed, it is safely replicated and will not be lost. It also ensures that when followers serve
+read queries, they are providing access to a consistent and up-to-date state of the database.