Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ go tool pprof -http="0.0.0.0:8081" http://localhost:8080/debug/pprof/profile?sec
```

Open `<your-ip>:8081` and select `Flame Graph` from the VIEW menus in the site header:
<img src="https://user-images.githubusercontent.com/172204/208336392-5b64bb9b-cce8-4562-9e05-c3d538e9d8a6.png"/>
<img alt="CPU profiling" src="https://user-images.githubusercontent.com/172204/208336392-5b64bb9b-cce8-4562-9e05-c3d538e9d8a6.png"/>

## Query Level CPU Profiling

Expand All @@ -25,15 +25,16 @@ Currently, it does not work on Mac, with either intel or Arm.
### Enable memory profiling

1. Build `databend-query` with `memory-profiling` feature enabled:
```
cargo build --bin databend-query --release --features memory-profiling
```

```
cargo build --bin databend-query --release --features memory-profiling
```

2. Fire up `databend`, using environment variable `MALLOC_CONF` to enable memory profiling:
```
MALLOC_CONF=prof:true,lg_prof_interval:30 ./target/release/databend-query
```

```
MALLOC_CONF=prof:true,lg_prof_interval:30 ./target/release/databend-query
```

### Generate heap profile

Expand All @@ -43,14 +44,15 @@ Generate a call graph in `pdf` illustrating memory allocation during this interv
jeprof --pdf ./target/release/databend-query heap.prof > heap.pdf
```

<img src="https://user-images.githubusercontent.com/172204/204963954-f6eacf10-d8bd-4469-9c8d-7d30955f1a78.png" width="600"/>
<img alt="Generate heap profile" src="https://user-images.githubusercontent.com/172204/204963954-f6eacf10-d8bd-4469-9c8d-7d30955f1a78.png" width="600"/>

### Fast jeprof

jeprof is very slow for large heap analysis, the bottleneck is `addr2line`, if you want to speed up from **30 minutes to 3s**, please use :

```
git clone https://github.com/gimli-rs/addr2line
cd addr2line
cargo b --examples -r
cp ./target/release/examples/addr2line <your-addr2line-find-with-whereis-addr2line>
```

Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Databend deployment provides two modes: standalone and cluster, each with differ

In standalone mode, a standard configuration consists of a single Meta node and a single Query node. This minimal setup is suitable for testing purposes or small-scale deployments. However, it is important to note that standalone mode is not recommended for production environments due to its limited scalability and the absence of high availability features.

<img src="/img/deploy/deploy-standalone-arch.png"/>
<img alt="Standalone Deployment" src="/img/deploy/deploy-standalone-arch.png"/>

In a Standalone Databend Deployment, it is possible to host both the Meta and Query nodes on a single server. The following topics in the documentation assist you in setting up and deploying a standalone Databend:

Expand All @@ -37,7 +37,7 @@ Cluster mode is designed for larger-scale deployments and provides enhanced capa

In a Databend cluster, multiple Query nodes can be deployed, and it is possible to create a more powerful Query cluster by grouping specific Query nodes together (using Cluster IDs) for different query performance requirements. A Databend cluster has the capacity to accommodate multiple Query clusters. By default, Databend leverages computational concurrency to its maximum potential, allowing a single SQL query to utilize all available CPU cores within a single Query node. However, when utilizing a Query cluster, Databend takes advantage of concurrent scheduling and executes computations across the entire cluster. This approach maximizes system performance and provides enhanced computational capabilities.

<img src="/img/deploy/deploy-cluster-arch.png"/>
<img alt="Cluster Deployment" src="/img/deploy/deploy-cluster-arch.png"/>

#### Query Cluster Size

Expand Down
64 changes: 33 additions & 31 deletions docs/en/guides/10-deploy/02-upgrade/10-compatibility.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
title: Compatibility
sidebar_label: Compatibility
description:
Investigate and manage the compatibility
description: Investigate and manage the compatibility
---

This guideline will introduce how to investigate and manage the compatibility:

- between databend-query and databend-meta.
- between different versions of databend-meta.

Expand Down Expand Up @@ -74,6 +74,7 @@ When handshaking:
Handshake succeeds if both of these two assertions hold.

E.g.:

- `S: (ver=3, min_cli_ver=1)` is compatible with `C: (ver=3, min_srv_ver=2)`.
- `S: (ver=4, min_cli_ver=4)` is **NOT** compatible with `C: (ver=3, min_srv_ver=2)`.
Because although `S.ver(4) >= C.min_srv_ver(3)` holds,
Expand All @@ -96,65 +97,67 @@ S.ver: 2 3 4
The following is an illustration of the latest query-meta compatibility:

| `Meta\Query` | [0.9.41, 1.1.34) | [1.1.34, 1.2.287) | [1.2.287, 1.2.361) | [1.2.361, +∞) |
|:-------------------|:-----------------|:---------------|:-----------|:-----------|
| [0.8.30, 0.8.35) | ❌ | ❌ | ❌ |❌ |
| [0.8.35, 0.9.23) | ✅ | ❌ | ❌ |❌ |
| [0.9.23, 0.9.42) | ✅ | ❌ | ❌ |❌ |
| [0.9.42, 1.1.32) | ✅ | ❌ | ❌ |❌ |
| [1.1.32, 1.2.63) | ✅ | ✅ | ❌ |❌ |
| [1.2.63, 1.2.226) | ✅ | ✅ | ❌ |❌ |
| [1.2.226, 1.2.258) | ✅ | ✅ | ✅ |❌ |
| [1.2.258, +∞) | ✅ | ✅ | ✅ |✅ |
| :----------------- | :--------------- | :---------------- | :----------------- | :------------ |
| [0.8.30, 0.8.35) | ❌ | ❌ | ❌ | ❌ |
| [0.8.35, 0.9.23) | ✅ | ❌ | ❌ | ❌ |
| [0.9.23, 0.9.42) | ✅ | ❌ | ❌ | ❌ |
| [0.9.42, 1.1.32) | ✅ | ❌ | ❌ | ❌ |
| [1.1.32, 1.2.63) | ✅ | ✅ | ❌ | ❌ |
| [1.2.63, 1.2.226) | ✅ | ✅ | ❌ | ❌ |
| [1.2.226, 1.2.258) | ✅ | ✅ | ✅ | ❌ |
| [1.2.258, +∞) | ✅ | ✅ | ✅ | ✅ |

History versions that are not included in the above chart:

- Query `[0.7.59, 0.8.80)` is compatible with Meta `[0.8.30, 0.9.23)`.
- Query `[0.8.80, 0.9.41)` is compatible with Meta `[0.8.35, 0.9.42)`.


<img src="/img/deploy/compatibility.excalidraw.png"/>
<img alt="Compatibility status" src="/img/deploy/compatibility.excalidraw.png"/>

# Compatibility between databend-query

## Version Compatibility Matrix

| Query version | Backward compatible with | Key Changes |
|:-------------------|:--------------------------|:------------|
| [-∞, 1.2.307) | [-∞, 1.2.311) | Original format |
| [1.2.307, 1.2.311) | [-∞, 1.2.311) | Added Role info with PB/JSON support |
| [1.2.311, 1.2.709) | [1.2.307, +∞) | Role info serialized to PB only |
| [1.2.709, +∞) | [1.2.709, +∞) | **Important**: Fuse storage path changed |
| Query version | Backward compatible with | Key Changes |
| :----------------- | :----------------------- | :--------------------------------------- |
| [-∞, 1.2.307) | [-∞, 1.2.311) | Original format |
| [1.2.307, 1.2.311) | [-∞, 1.2.311) | Added Role info with PB/JSON support |
| [1.2.311, 1.2.709) | [1.2.307, +∞) | Role info serialized to PB only |
| [1.2.709, +∞) | [1.2.709, +∞) | **Important**: Fuse storage path changed |

## Important Changes & Upgrade Instructions

### Version 1.2.307

- Support deserialize Role info with PB and JSON
- Only support serialize Role info to JSON
- **Upgrade to this version first** if you're on an earlier version

### Version 1.2.311

- Only support serialize Role info to PB
- **Upgrade to this version next** after reaching 1.2.307
- Example upgrade path: `1.2.306 -> 1.2.307 -> 1.2.311 -> 1.2.312`

### Version 1.2.709

- **Important Change**: Fuse storage path modified
- ⚠️ Versions before 1.2.709 may not be able to read some data from versions 1.2.709+
- ⚠️ **Recommendation**: All nodes under the same tenant should be upgraded together
- Avoid mixing nodes with versions before and after 1.2.709 to prevent potential data access issues

### Version 1.2.764

- If you need specify a different storage location for `system_history` tables. All nodes under the same tenant need to be upgraded to 1.2.764+

## Compatibility between databend-meta

| Meta version | Backward compatible with |
|:--------------------|:-------------------------|
| [0.9.41, 1.2.212) | [0.9.41, 1.2.212) |
| [1.2.212, 1.2.479) | [0.9.41, 1.2.479) |
| [1.2.479, 1.2.655) | [1.2.288, 1.2.655) |
| [1.2.655, +∞) | [1.2.288, +∞) |

| Meta version | Backward compatible with |
| :----------------- | :----------------------- |
| [0.9.41, 1.2.212) | [0.9.41, 1.2.212) |
| [1.2.212, 1.2.479) | [0.9.41, 1.2.479) |
| [1.2.479, 1.2.655) | [1.2.288, 1.2.655) |
| [1.2.655, +∞) | [1.2.288, +∞) |

![](@site/static/img/deploy/compat-meta-meta-1-2-655.svg)

Expand Down Expand Up @@ -182,16 +185,15 @@ History versions that are not included in the above chart:
- `1.2.655` 2024-11-11 Introduce on-disk `V004`, using WAL based Raft log storage,
which is compatible with `V002`. The oldest compatible version is `1.2.288`(`1.2.212~1.2.287` are removed).


## Compatibility of databend-meta on-disk data

The on-disk data of Databend-meta evolves over time while maintaining backward compatibility.

| DataVersion | Databend-version | Min Compatible with |
|:------------|:-----------------|:--------------------|
| V004 | 1.2.655 | V002 |
| V003 | 1.2.547 | V002 |
| V002 | 1.2.53 | V001 |
| :---------- | :--------------- | :------------------ |
| V004 | 1.2.655 | V002 |
| V003 | 1.2.547 | V002 |
| V002 | 1.2.53 | V001 |
| V001 | 1.1.40 | V0 |

### Identifying the versions
Expand Down
6 changes: 3 additions & 3 deletions docs/en/guides/10-deploy/03-monitor/30-tracing.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,12 +235,12 @@ tokio-console # for meta console, http://127.0.0.1:6669

**databend-query**

<img src="/img/tracing/query-console.png"/>
<img alt="databend-query" src="/img/tracing/query-console.png"/>

**databend-meta**

<img src="/img/tracing/meta-console.png"/>
<img alt="databend-meta" src="/img/tracing/meta-console.png"/>

**task in console**

<img src="/img/tracing/task-in-console.png"/>
<img alt="task in console" src="/img/tracing/task-in-console.png"/>
26 changes: 16 additions & 10 deletions docs/en/guides/40-load-data/02-load-db/airbyte.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,17 @@ title: Airbyte
---

<p align="center">
<img src="/img/integration/integration-airbyte.png"/>
<img alt="Airbyte" src="/img/integration/integration-airbyte.png"/>
</p>

## What is [Airbyte](https://airbyte.com/)?


* Airbyte is an open-source data integration platform that syncs data from applications, APIs & databases to data warehouses lakes & DBs.
* You could load data from any airbyte source to Databend.
- Airbyte is an open-source data integration platform that syncs data from applications, APIs & databases to data warehouses lakes & DBs.
- You could load data from any airbyte source to Databend.

Currently we implemented an experimental airbyte destination allow you to send data from your airbyte source to databend

**NOTE**:
**NOTE**:

currently we only implemented the `append` mode, which means the destination will only append data to the table, and will not overwrite, update or delete any data.
Plus, we assume that your databend destination is **S3 Compatible** since we used presign to copy data from databend stage to table.
Expand All @@ -32,21 +31,25 @@ Please read [this](../../10-deploy/01-deploy/01-non-production/00-deploying-loca
## Create a Databend User

Connect to Databend server with MySQL client:

```shell
mysql -h127.0.0.1 -uroot -P3307
mysql -h127.0.0.1 -uroot -P3307
```

Create a user:

```sql
CREATE USER user1 IDENTIFIED BY 'abc123';
```

Create a Database:

```sql
CREATE DATABASE airbyte;
```

Grant privileges for the user:

```sql
GRANT ALL PRIVILEGES ON airbyte.* TO user1;
```
Expand All @@ -56,24 +59,27 @@ GRANT ALL PRIVILEGES ON airbyte.* TO user1;
To use Databend with Airbyte, you should add our customized connector to your Airbyte Instance.
You could add the destination in Settings -> Destinations -> Custom Destinations -> Add a Custom Destination Page.
Our custom destination image is `datafuselabs/destination-databend:alpha`

<p align="center">
<img src="/img/integration/integration-airbyte-plugins.png"/>
<img alt="Configure Airbyte" src="/img/integration/integration-airbyte-plugins.png"/>
</p>

## Setup Databend destination
**Note**:

**Note**:

You should have a databend instance running and accessible from your airbyte instance.
For local airbyte, you could not connect docker compose with your localhost network.
You may take a look at [ngrok](https://ngrok.com/) to tunnel your service(**NEVER** expose it on your production environment).

<p align="center">
<img src="/img/integration/integration-airbyte-destinations.png"/>
<img alt="Setup Databend destination" src="/img/integration/integration-airbyte-destinations.png"/>
</p>

## Test your integration

You could use Faker source to test your integration, after sync completed, you could run the following command to see expected uploaded data.

```sql
select * from default._airbyte_raw_users limit 5;
```
```
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,6 @@ bendsql --quote-style never --query="EXPLAIN PERF SELECT avg(number) FROM number

Then, you can open the `demo.html` file in your browser to view the flame graphs:

<img src="https://github.com/user-attachments/assets/07acfefa-a1c3-4c00-8c43-8ca1aafc3224"/>
<img alt="graphs" src="https://github.com/user-attachments/assets/07acfefa-a1c3-4c00-8c43-8ca1aafc3224"/>

If the query finishes very quickly, it may not collect enough data, resulting in an empty flame graph.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Funnel Analysis
---

<p align="center">
<img src="https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/learn/databend-funnel.png" width="550"/>
<img alt="Databend Funnel Analysis" src="https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/learn/databend-funnel.png" width="550"/>
</p>

## WINDOW_FUNNEL
Expand All @@ -13,25 +13,24 @@ Similar to `windowFunnel` in ClickHouse (they were created by the same author),

The function works according to the algorithm:

- The function searches for data that triggers the first condition in the chain and sets the event counter to 1. This is the moment when the sliding window starts.
- The function searches for data that triggers the first condition in the chain and sets the event counter to 1. This is the moment when the sliding window starts.

- If events from the chain occur sequentially within the window, the counter is incremented. If the sequence of events is disrupted, the counter isn’t incremented.

- If the data has multiple event chains at varying completion points, the function will only output the size of the longest chain.
- If events from the chain occur sequentially within the window, the counter is incremented. If the sequence of events is disrupted, the counter isn’t incremented.

- If the data has multiple event chains at varying completion points, the function will only output the size of the longest chain.

```sql
WINDOW_FUNNEL( <window> )( <timestamp>, <cond1>, <cond2>, ..., <condN> )
```

**Arguments**

- `<timestamp>` — Name of the column containing the timestamp. Data types supported: integer types and datetime types.
- `<cond>` — Conditions or data describing the chain of events. Must be `Boolean` datatype.
- `<timestamp>` — Name of the column containing the timestamp. Data types supported: integer types and datetime types.
- `<cond>` — Conditions or data describing the chain of events. Must be `Boolean` datatype.

**Parameters**

- `<window>` — Length of the sliding window, it is the time interval between the first and the last condition. The unit of `window` depends on the `timestamp` itself and varies. Determined using the expression `timestamp of cond1 <= timestamp of cond2 <= ... <= timestamp of condN <= timestamp of cond1 + window`.
- `<window>` — Length of the sliding window, it is the time interval between the first and the last condition. The unit of `window` depends on the `timestamp` itself and varies. Determined using the expression `timestamp of cond1 <= timestamp of cond2 <= ... <= timestamp of condN <= timestamp of cond1 + window`.

**Returned value**

Expand All @@ -40,7 +39,6 @@ All the chains in the selection are analyzed.

Type: `UInt8`.


**Example**

Determine if a set period of time is enough for the user to SELECT a phone and purchase it twice in the online store.
Expand All @@ -52,7 +50,6 @@ Set the following chain of events:
3. The user adds to the shopping cart(`event_name = 'cart'`).
4. The user complete the purchase (`event_name = 'purchase'`).


```sql
CREATE TABLE events(user_id BIGINT, event_name VARCHAR, event_timestamp TIMESTAMP);

Expand Down Expand Up @@ -124,7 +121,6 @@ Result:
+-------+-------+
```

* User `100126` level is 2 (`login -> visit`) .
* user `100125` level is 3 (`login -> visit -> cart`).
* User `100123` level is 4 (`login -> visit -> cart -> purchase`).

- User `100126` level is 2 (`login -> visit`) .
- user `100125` level is 3 (`login -> visit -> cart`).
- User `100123` level is 4 (`login -> visit -> cart -> purchase`).
Loading