Skip to content

Commit cd6a39a

Browse files
Update horizontal scaling
1 parent 10cf370 commit cd6a39a

File tree

1 file changed

+28
-51
lines changed

1 file changed

+28
-51
lines changed
Lines changed: 28 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,66 +1,43 @@
11
= Horizontal Scaling
22

3-
When a single server is insufficient to meet the uptime and reliability requirements of a production application, it becomes necessary to
4-
scale out. In TypeDB, this is achieved by creating a database cluster that provides high availability and fault tolerance through data
5-
replication. This chapter explains the architecture and mechanics of a TypeDB cluster, with a focus on how replication enables fault
6-
tolerance and how transactions are processed across different nodes.
3+
When a single server can no longer handle the volume of read queries for your application, it becomes necessary to scale out horizontally.
4+
In TypeDB, this is achieved by creating a database cluster that distributes the read load across multiple nodes. This chapter explains how
5+
TypeDB's replication-based architecture enables horizontal scaling for read transactions while also providing high availability and fault
6+
tolerance.
77

8-
== Introduction to High Availability
8+
== Scaling read throughput with replication
99

10-
For many applications, continuous availability is critical. A single server represents a single point of failure; if that server goes down
11-
due to a hardware failure or network issue, or is overwhelmed by the volume of requests, the application becomes unavailable. A TypeDB
12-
cluster mitigates this risk by deploying the database across multiple servers, or nodes.
10+
TypeDB's strategy for horizontal scaling is based on data replication using the Raft consensus algorithm. Because every node has access to
11+
all the data, read-only transactions can be executed on any node in the cluster. This allows you to scale your application's read throughput
12+
linearly by simply adding more nodes. As you add nodes, the cluster's capacity to handle concurrent read queries increases proportionally.
1313

14-
TypeDB achieves high availability and fault tolerance through data replication. Replication ensures that every node in the cluster maintains
15-
a complete copy of the entire database. This redundancy means that if one node fails, the other nodes can continue to serve requests without
16-
interruption or data loss, ensuring the database remains online.
14+
== The leader-follower model
1715

18-
A TypeDB cluster operates on a leader-follower model, which is managed by the RAFT consensus algorithm. This is the core technology that
19-
makes the cluster fault-tolerant and consistent.
16+
A TypeDB cluster operates on a leader-follower model. At any given time, the cluster elects a single leader node for each database, while
17+
all other nodes act as followers. Followers receive a stream of committed transactions from the leader and apply them to their local copy of
18+
the database, keeping them in sync. These nodes are only available to process read transactions.
2019

21-
- *Leader Node:* at any given time, the cluster elects a single node as the leader _separately for each database_. The leader is exclusively
22-
responsible for processing all schema and date writes. This design centralizes writes, which simplifies consistency and eliminates the
23-
complexities of distributed transactions that would arise if data were partitioned.
20+
The leader is exclusively responsible for processing all schema and data writes. Centralizing writes on a single node simplifies consistency
21+
and ensures that all changes are applied in a strict order. Write throughput is therefore determined by the capacity of the single leader
22+
node and is scaled by increasing its resources (see xref:{page-version}@new_core_concepts::typedb/vertical_scaling.adoc[vertical scaling]).
2423

25-
- *Follower Nodes:* all other nodes in the cluster act as followers. They passively receive a stream of committed transactions from the
26-
leader's log and apply them to their own local copy of the database. This keeps them in sync with the leader.
24+
If a leader node fails, the cluster automatically elects a new leader from among the followers, ensuring that the database remains available
25+
for writes with minimal interruption.
2726

28-
- *Leader Election:* if the leader node fails or becomes unreachable, the RAFT algorithm automatically initiates a new election among the
29-
remaining follower nodes. A new leader is chosen from the followers that have the most up-to-date log, and the cluster can resume write
30-
operations with minimal downtime, typically within seconds.
31-
32-
Since all write operations must go through a single leader, the write throughput of the cluster is equivalent to the write throughput of a
33-
single node. To scale write performance, you must scale the leader node vertically (i.e., provide it with more powerful hardware).
34-
35-
Read performance, however, can be scaled horizontally. Because every node in the cluster holds a complete copy of the data, read-only
36-
transactions can be executed on any node, whether it's the leader or a follower. By directing read queries to follower nodes, you can
37-
distribute the read load across the entire cluster. This allows the system to handle a much higher volume of concurrent read requests than a
38-
single server could, significantly improving read scalability.
39-
40-
== Interacting with a Cluster
27+
== Interacting with a cluster
4128

4229
Interacting with a cluster is very similar to interacting with a single server. The key difference is that the client driver must be
4330
configured with the network addresses of all nodes in the cluster.
4431

45-
The driver uses this list to intelligently manage connections. It automatically discovers which node is the current leader for the database
46-
and routes all write transactions to it. For read transactions, the driver can be configured to distribute the load across all available
47-
nodes (both leader and followers), effectively using the entire cluster's capacity for reads. This routing is handled internally, so your
48-
application code for opening sessions and running transactions remains the same whether you are connecting to a single node or a full
49-
cluster.
50-
51-
== Consistency and Durability in a Cluster
52-
53-
TypeDB's replication model, managed by RAFT, provides strong consistency guarantees. When a client sends a write transaction to the leader,
54-
the following steps ensure its durability and consistency:
55-
56-
- The leader appends the transaction to its internal, on-disk log.
57-
58-
- The leader sends this new log entry to all follower nodes.
59-
60-
- The leader waits until a quorum (a majority of the nodes in the cluster, including itself) has acknowledged that they have successfully
61-
written the entry to their own logs.
32+
The driver uses this list to intelligently manage connections. It automatically discovers which node is the current leader for a given
33+
database and routes all write transactions to it. For read transactions, the driver can distribute the load across all available nodes (both
34+
leader and followers) by setting the `read_any_replica` option during opening the transaction, effectively using the entire cluster's
35+
capacity. This routing is handled transparently, so your application code for opening sessions and running transactions remains the same
36+
whether you are connecting to a single node or a full cluster.
6237

63-
- Only after reaching this quorum does the leader apply the transaction to its state machine and confirm the commit to the client.
38+
== Consistency and durability
6439

65-
This process guarantees that once a transaction is committed, it is durably stored on a majority of the cluster's nodes and will survive the
66-
failure of any minority of nodes. This ensures that the database remains in a consistent state, and no committed data is ever lost.
40+
TypeDB's replication model provides strong consistency guarantees, even in a distributed read environment. When a write transaction is sent
41+
to the leader, it is not confirmed until a majority of nodes in the cluster have durably stored the transaction in their logs. This process
42+
guarantees that once a transaction is committed, it is safely replicated and will not be lost. It also ensures that when followers serve
43+
read queries, they are providing access to a consistent and up-to-date state of the database.

0 commit comments

Comments
 (0)