Skip to content

Commit f5637ca

Browse files
Backport to branch(3.10) : Update requirements and recommendations (#1595)
Co-authored-by: Josh Wong <[email protected]>
1 parent 2ce90e0 commit f5637ca

File tree

2 files changed

+116
-24
lines changed

2 files changed

+116
-24
lines changed

docs/requirements.md

Lines changed: 114 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,143 @@
11
# Requirements and Recommendations for the Underlying Databases of ScalarDB
22

3-
This document explains the requirements and recommendations in the underlying databases of ScalarDB to make ScalarDB applications work correctly.
3+
This document explains the requirements and recommendations in the underlying databases of ScalarDB to make ScalarDB applications work correctly and efficiently.
44

5-
## Common requirements
5+
## Requirements
66

7-
This section describes common requirements for the underlying databases when using ScalarDB.
7+
ScalarDB requires each underlying database to provide certain capabilities to run transactions and analytics on the databases. This document explains the general requirements and how to configure each database to achieve the requirements.
88

9-
### Privileges to access the underlying databases
9+
### General requirements
1010

11-
ScalarDB operates the underlying databases not only for CRUD operations but also for performing operations like creating or altering schemas, tables, or indexes. Thus, ScalarDB basically requires a fully privileged account to access the underlying databases.
11+
#### Transactions
12+
{:.no_toc}
13+
ScalarDB requires each underlying database to provide at least the following capabilities to run transactions on the databases:
1214

13-
## Cassandra or Cassandra-compatible database requirements
15+
- Linearizable read and conditional mutations (write and delete) on a single database record.
16+
- Durability of written database records.
17+
- Ability to store arbitrary data besides application data in each database record.
1418

15-
The following are requirements to make ScalarDB on Cassandra or Cassandra-compatible databases work properly and for storage operations with `LINEARIZABLE` to provide linearizability and for transaction operations with `SERIALIZABLE` to provide strict serializability.
19+
#### Analytics
20+
{:.no_toc}
21+
ScalarDB requires each underlying database to provide the following capability to run analytics on the databases:
1622

17-
### Ensure durability in Cassandra
23+
- Ability to return only committed records.
1824

19-
In **cassandra.yaml**, you must change `commitlog_sync` from the default `periodic` to `batch` or `group` to ensure durability in Cassandra.
25+
{% capture notice--info %}
26+
**Note**
27+
28+
You need to have database accounts that have enough privileges to access the databases through ScalarDB since ScalarDB runs on the underlying databases not only for CRUD operations but also for performing operations like creating or altering schemas, tables, or indexes. ScalarDB basically requires a fully privileged account to access the underlying databases.
29+
{% endcapture %}
30+
31+
<div class="notice--info">{{ notice--info | markdownify }}</div>
32+
33+
### How to configure databases to achieve the general requirements
34+
35+
Select your database for details on how to configure it to achieve the general requirements.
36+
37+
<div id="tabset-1">
38+
<div class="tab">
39+
<button class="tablinks" onclick="openTab(event, 'JDBC_databases', 'tabset-1')" id="defaultOpen-1">JDBC databases</button>
40+
<button class="tablinks" onclick="openTab(event, 'DynamoDB', 'tabset-1')">DynamoDB</button>
41+
<button class="tablinks" onclick="openTab(event, 'Cosmos_DB_for_NoSQL', 'tabset-1')">Cosmos DB for NoSQL</button>
42+
<button class="tablinks" onclick="openTab(event, 'Cassandra', 'tabset-1')">Cassandra</button>
43+
</div>
44+
45+
<div id="JDBC_databases" class="tabcontent" markdown="1">
46+
47+
#### Transactions
48+
{:.no_toc}
49+
- Use a single primary server or synchronized multi-primary servers for all operations (no read operations on read replicas that are asynchronously replicated from a primary database).
50+
- Use read-committed or stricter isolation levels.
51+
52+
#### Analytics
53+
{:.no_toc}
54+
- Use read-committed or stricter isolation levels.
2055

21-
ScalarDB provides only the atomicity and isolation properties of ACID and requests the underlying databases to provide durability. Although you can specify `periodic`, we do not recommend doing so unless you know exactly what you are doing.
56+
</div>
2257

23-
### Confirm that the Cassandra-compatible database supports lightweight transactions (LWTs)
58+
<div id="DynamoDB" class="tabcontent" markdown="1">
2459

25-
You must use a Cassandra-compatible database that supports LWTs.
60+
#### Transactions
61+
{:.no_toc}
62+
- Use a single primary region for all operations. (No read and write operations on global tables in non-primary regions.)
63+
- There is no concept for primary regions in DynamoDB, so you must designate a primary region by yourself.
2664

27-
ScalarDB does not work on some Cassandra-compatible databases that do not support LWTs, such as [Amazon Keyspaces](https://aws.amazon.com/keyspaces/). This is because the Consensus Commit transaction manager relies on the linearizable operations of underlying databases to make transactions serializable.
65+
#### Analytics
66+
{:.no_toc}
67+
- Not applicable. DynamoDB always returns committed records, so there are no DynamoDB-specific requirements.
2868

29-
## CosmosDB database requirements
69+
</div>
3070

31-
In your Azure CosmosDB account, you must set the **default consistency level** to **Strong**.
71+
<div id="Cosmos_DB_for_NoSQL" class="tabcontent" markdown="1">
3272

33-
Consensus Commit, the ScalarDB transaction protocol, requires linearizable reads. By setting the **default consistency level** to **Strong**, CosmosDB can guarantee linearizability.
73+
#### Transactions
74+
{:.no_toc}
75+
- Use a single primary region for all operations with `Strong` or `Bounded Staleness` consistency.
3476

35-
For instructions on how to configure this setting, see the official documentation at [Configure the default consistency level](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-manage-consistency#configure-the-default-consistency-level).
77+
#### Analytics
78+
{:.no_toc}
79+
- Not applicable. Cosmos DB always returns committed records, so there are no Cosmos DB–specific requirements.
3680

37-
## JDBC database recommendations
81+
</div>
3882

39-
In ScalarDB on JDBC databases, you can't choose a consistency level (`LINEARIZABLE`, `SEQUENTIAL` or `EVENTUAL`) in your code by using the `Operation.withConsistency()` method. In addition, the consistency level depends on the setup of your JDBC database.
83+
<div id="Cassandra" class="tabcontent" markdown="1">
4084

41-
For example, if you have asynchronous read replicas in your setup and perform read operations against them, the consistency will be eventual because you can read stale data from the read replicas. On the other hand, if you perform all operations against a single master instance, the consistency will be linearizable.
85+
#### Transactions
86+
{:.no_toc}
87+
- Use a single primary cluster for all operations (no read or write operations in non-primary clusters).
88+
- Use `batch` or `group` for `commitlog_sync`.
89+
- If you're using Cassandra-compatible databases, those databases must properly support lightweight transactions (LWT).
4290

43-
With this in mind, you must perform all operations or transactions against a single master instance so that you can achieve linearizability and avoid worrying about consistency issues in your application. In other words, ScalarDB does not support read replicas.
91+
#### Analytics
92+
{:.no_toc}
93+
- Not applicable. Cassandra always returns committed records, so there are no Cassandra-specific requirements.
94+
95+
</div>
96+
</div>
97+
98+
## Recommendations
99+
100+
Properly configuring each underlying database of ScalarDB for high performance and high availability is recommended. The following recommendations include some knobs and configurations to update.
44101

45102
{% capture notice--info %}
46103
**Note**
47104

48-
You can still use a read replica as a backup and standby even when following this guideline.
105+
ScalarDB can be seen as an application of underlying databases, so you may want to try updating other knobs and configurations that are commonly used to improve efficiency.
49106
{% endcapture %}
107+
<div class="notice--info">{{ notice--info | markdownify }}</div>
108+
109+
<div id="tabset-2">
110+
<div class="tab">
111+
<button class="tablinks" onclick="openTab(event, 'JDBC_databases2', 'tabset-2')" id="defaultOpen-2">JDBC databases</button>
112+
<button class="tablinks" onclick="openTab(event, 'DynamoDB2', 'tabset-2')">DynamoDB</button>
113+
<button class="tablinks" onclick="openTab(event, 'Cosmos_DB_for_NoSQL2', 'tabset-2')">Cosmos DB for NoSQL</button>
114+
<button class="tablinks" onclick="openTab(event, 'Cassandra2', 'tabset-2')">Cassandra</button>
115+
</div>
116+
117+
<div id="JDBC_databases2" class="tabcontent" markdown="1">
118+
- Use read-committed isolation for better performance.
119+
- Follow the performance optimization best practices for each database. For example, increasing the buffer size (for example, `shared_buffers` in PostgreSQL) and increasing the number of connections (for example, `max_connections` in PostgreSQL) are usually recommended for better performance.
120+
</div>
50121

122+
<div id="DynamoDB2" class="tabcontent" markdown="1">
123+
- Increase the number of read capacity units (RCUs) and write capacity units (WCUs) for high throughput.
124+
- Enable point-in-time recovery (PITR).
125+
126+
{% capture notice--info %}
127+
**Note**
128+
129+
Since DynamoDB stores data in multiple availability zones by default, you don’t need to adjust any configurations to improve availability.
130+
{% endcapture %}
51131
<div class="notice--info">{{ notice--info | markdownify }}</div>
132+
</div>
133+
134+
<div id="Cosmos_DB_for_NoSQL2" class="tabcontent" markdown="1">
135+
- Increase the number of Request Units (RUs) for high throughput.
136+
- Enable point-in-time restore (PITR).
137+
- Enable availability zones.
138+
</div>
139+
140+
<div id="Cassandra2" class="tabcontent" markdown="1">
141+
- Increase `concurrent_reads` and `concurrent_writes` for high throughput. For details, see the official Cassandra documentation about [`concurrent_writes`](https://cassandra.apache.org/doc/stable/cassandra/configuration/cass_yaml_file.html#concurrent_writes).
142+
</div>
143+
</div>

docs/scalardb-supported-databases.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ ScalarDB supports the following databases and their versions.
1818
{% capture notice--info %}
1919
**Note**
2020

21-
For requirements when using Cassandra or Cassandra-compatible databases, see [Cassandra or Cassandra-compatible database requirements](requirements.md#cassandra-or-cassandra-compatible-database-requirements).
21+
For requirements when using Cassandra or Cassandra-compatible databases, see [How to configure databases to achieve the general requirements](requirements.md#how-to-configure-databases-to-achieve-the-general-requirements).
2222
{% endcapture %}
2323

2424
<div class="notice--info">{{ notice--info | markdownify }}</div>
@@ -48,7 +48,7 @@ For requirements when using Cassandra or Cassandra-compatible databases, see [Ca
4848
{% capture notice--info %}
4949
**Note**
5050

51-
For recommendations when using JDBC databases, see [JDBC database recommendations](requirements.md#jdbc-database-recommendations).
51+
For recommendations when using JDBC databases, see [Recommendations](requirements.md#recommendations).
5252
{% endcapture %}
5353

5454
<div class="notice--info">{{ notice--info | markdownify }}</div>

0 commit comments

Comments
 (0)