|
1 | 1 | = Horizontal Scaling
|
2 | 2 |
|
3 |
| -When a single server is insufficient to meet the uptime and reliability requirements of a production application, it becomes necessary to |
4 |
| -scale out. In TypeDB, this is achieved by creating a database cluster that provides high availability and fault tolerance through data |
5 |
| -replication. This chapter explains the architecture and mechanics of a TypeDB cluster, with a focus on how replication enables fault |
6 |
| -tolerance and how transactions are processed across different nodes. |
| 3 | +When a single server can no longer handle the volume of read queries for your application, it becomes necessary to scale out horizontally. |
| 4 | +In TypeDB, this is achieved by creating a database cluster that distributes the read load across multiple nodes. This chapter explains how |
| 5 | +TypeDB's replication-based architecture enables horizontal scaling for read transactions while also providing high availability and fault |
| 6 | +tolerance. |
7 | 7 |
|
8 |
| -== Introduction to High Availability |
| 8 | +== Scaling read throughput with replication |
9 | 9 |
|
10 |
| -For many applications, continuous availability is critical. A single server represents a single point of failure; if that server goes down |
11 |
| -due to a hardware failure or network issue, or is overwhelmed by the volume of requests, the application becomes unavailable. A TypeDB |
12 |
| -cluster mitigates this risk by deploying the database across multiple servers, or nodes. |
| 10 | +TypeDB's strategy for horizontal scaling is based on data replication using the Raft consensus algorithm. Because every node has access to |
| 11 | +all the data, read-only transactions can be executed on any node in the cluster. This allows you to scale your application's read throughput |
| 12 | +linearly by simply adding more nodes. As you add nodes, the cluster's capacity to handle concurrent read queries increases proportionally. |
13 | 13 |
|
14 |
| -TypeDB achieves high availability and fault tolerance through data replication. Replication ensures that every node in the cluster maintains |
15 |
| -a complete copy of the entire database. This redundancy means that if one node fails, the other nodes can continue to serve requests without |
16 |
| -interruption or data loss, ensuring the database remains online. |
| 14 | +== The leader-follower model |
17 | 15 |
|
18 |
| -A TypeDB cluster operates on a leader-follower model, which is managed by the RAFT consensus algorithm. This is the core technology that |
19 |
| -makes the cluster fault-tolerant and consistent. |
| 16 | +A TypeDB cluster operates on a leader-follower model. At any given time, the cluster elects a single leader node for each database, while |
| 17 | +all other nodes act as followers. Followers receive a stream of committed transactions from the leader and apply them to their local copy of |
| 18 | +the database, keeping them in sync. These nodes are only available to process read transactions. |
20 | 19 |
|
21 |
| -- *Leader Node:* at any given time, the cluster elects a single node as the leader _separately for each database_. The leader is exclusively |
22 |
| - responsible for processing all schema and date writes. This design centralizes writes, which simplifies consistency and eliminates the |
23 |
| - complexities of distributed transactions that would arise if data were partitioned. |
| 20 | +The leader is exclusively responsible for processing all schema and data writes. Centralizing writes on a single node simplifies consistency |
| 21 | +and ensures that all changes are applied in a strict order. Write throughput is therefore determined by the capacity of the single leader |
| 22 | +node and is scaled by increasing its resources (see xref:{page-version}@new_core_concepts::typedb/vertical_scaling.adoc[vertical scaling]). |
24 | 23 |
|
25 |
| -- *Follower Nodes:* all other nodes in the cluster act as followers. They passively receive a stream of committed transactions from the |
26 |
| - leader's log and apply them to their own local copy of the database. This keeps them in sync with the leader. |
| 24 | +If a leader node fails, the cluster automatically elects a new leader from among the followers, ensuring that the database remains available |
| 25 | +for writes with minimal interruption. |
27 | 26 |
|
28 |
| -- *Leader Election:* if the leader node fails or becomes unreachable, the RAFT algorithm automatically initiates a new election among the |
29 |
| - remaining follower nodes. A new leader is chosen from the followers that have the most up-to-date log, and the cluster can resume write |
30 |
| - operations with minimal downtime, typically within seconds. |
31 |
| - |
32 |
| -Since all write operations must go through a single leader, the write throughput of the cluster is equivalent to the write throughput of a |
33 |
| -single node. To scale write performance, you must scale the leader node vertically (i.e., provide it with more powerful hardware). |
34 |
| - |
35 |
| -Read performance, however, can be scaled horizontally. Because every node in the cluster holds a complete copy of the data, read-only |
36 |
| -transactions can be executed on any node, whether it's the leader or a follower. By directing read queries to follower nodes, you can |
37 |
| -distribute the read load across the entire cluster. This allows the system to handle a much higher volume of concurrent read requests than a |
38 |
| -single server could, significantly improving read scalability. |
39 |
| - |
40 |
| -== Interacting with a Cluster |
| 27 | +== Interacting with a cluster |
41 | 28 |
|
42 | 29 | Interacting with a cluster is very similar to interacting with a single server. The key difference is that the client driver must be
|
43 | 30 | configured with the network addresses of all nodes in the cluster.
|
44 | 31 |
|
45 |
| -The driver uses this list to intelligently manage connections. It automatically discovers which node is the current leader for the database |
46 |
| -and routes all write transactions to it. For read transactions, the driver can be configured to distribute the load across all available |
47 |
| -nodes (both leader and followers), effectively using the entire cluster's capacity for reads. This routing is handled internally, so your |
48 |
| -application code for opening sessions and running transactions remains the same whether you are connecting to a single node or a full |
49 |
| -cluster. |
50 |
| - |
51 |
| -== Consistency and Durability in a Cluster |
52 |
| - |
53 |
| -TypeDB's replication model, managed by RAFT, provides strong consistency guarantees. When a client sends a write transaction to the leader, |
54 |
| -the following steps ensure its durability and consistency: |
55 |
| - |
56 |
| -- The leader appends the transaction to its internal, on-disk log. |
57 |
| - |
58 |
| -- The leader sends this new log entry to all follower nodes. |
59 |
| - |
60 |
| -- The leader waits until a quorum (a majority of the nodes in the cluster, including itself) has acknowledged that they have successfully |
61 |
| - written the entry to their own logs. |
| 32 | +The driver uses this list to intelligently manage connections. It automatically discovers which node is the current leader for a given |
| 33 | +database and routes all write transactions to it. For read transactions, the driver can distribute the load across all available nodes (both |
| 34 | +leader and followers) by setting the `read_any_replica` option during opening the transaction, effectively using the entire cluster's |
| 35 | +capacity. This routing is handled transparently, so your application code for opening sessions and running transactions remains the same |
| 36 | +whether you are connecting to a single node or a full cluster. |
62 | 37 |
|
63 |
| -- Only after reaching this quorum does the leader apply the transaction to its state machine and confirm the commit to the client. |
| 38 | +== Consistency and durability |
64 | 39 |
|
65 |
| -This process guarantees that once a transaction is committed, it is durably stored on a majority of the cluster's nodes and will survive the |
66 |
| -failure of any minority of nodes. This ensures that the database remains in a consistent state, and no committed data is ever lost. |
| 40 | +TypeDB's replication model provides strong consistency guarantees, even in a distributed read environment. When a write transaction is sent |
| 41 | +to the leader, it is not confirmed until a majority of nodes in the cluster have durably stored the transaction in their logs. This process |
| 42 | +guarantees that once a transaction is committed, it is safely replicated and will not be lost. It also ensures that when followers serve |
| 43 | +read queries, they are providing access to a consistent and up-to-date state of the database. |
0 commit comments