Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
340af56
recreate dagplanner snapshots
mbroecheler Oct 6, 2025
8bb1772
update other snapshots
mbroecheler Oct 6, 2025
2ac7a3d
enable mermaid
mbroecheler Oct 6, 2025
f456dd0
update graphiql references and add swagger, mcp
mbroecheler Oct 8, 2025
e7e7d28
updating configuration documentation.
mbroecheler Oct 8, 2025
56a690c
updating configuration documentation.
mbroecheler Oct 8, 2025
25ba00d
break out engine configuration
mbroecheler Oct 8, 2025
1d8af7b
finalize engine config
mbroecheler Oct 8, 2025
a6fd88c
update default iceberg configuration
mbroecheler Oct 9, 2025
b9fe01b
add auto-update script for default configuration.
mbroecheler Oct 9, 2025
37e8594
update documentation, add howtos.
mbroecheler Oct 10, 2025
317894a
update stream enrichment howto
mbroecheler Oct 10, 2025
62e5546
polishing connectors.md
mbroecheler Oct 10, 2025
07cb6ed
polishing connectors.md
mbroecheler Oct 10, 2025
aacf8a5
update formatting
mbroecheler Oct 10, 2025
5de49cd
update formatting
mbroecheler Oct 10, 2025
d501c0b
update intro
mbroecheler Oct 10, 2025
4c73aa9
remove outdated context generator
mbroecheler Oct 10, 2025
5441bde
updated stdlib-docs to latest main
mbroecheler Oct 10, 2025
a548a9f
function generation
mbroecheler Oct 10, 2025
241241c
updated function name generation
mbroecheler Oct 10, 2025
6451e32
moving pages
mbroecheler Oct 11, 2025
0556769
fixed broken links
mbroecheler Oct 11, 2025
273907a
update compatibility table
ferenc-csaky Oct 17, 2025
5536968
minor formatting improvements
ferenc-csaky Oct 17, 2025
3f8fac0
auth testing docs and some more JWT guidance
ferenc-csaky Oct 17, 2025
0fe897d
fix license headers
ferenc-csaky Oct 20, 2025
75990c6
fix snapshots
ferenc-csaky Oct 20, 2025
78a579b
code format
ferenc-csaky Oct 29, 2025
7151e55
add env var resolution docs
ferenc-csaky Oct 29, 2025
3bd1d0c
fix snapshots
ferenc-csaky Oct 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,4 +129,7 @@ node_modules/

npm-debug.log*
yarn-debug.log*
yarn-error.log*
yarn-error.log*

#Generated markdown files
*-generated.md
3 changes: 2 additions & 1 deletion documentation/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,8 @@ npm run swizzle

### Content Structure
- **docs/**: Main documentation content in Markdown format
- Core documentation files include intro.md, getting-started.md, sqrl-language.md, etc.
- **intro/**: Introductory documentation (intro.md, getting-started.md, concepts.md, tutorials.md)
- Core documentation files include sqrl-language.md, interface.md, configuration.md, etc.
- **stdlib-docs/**: Embedded function library documentation
- **blog/**: Release notes, updates, and technical blog posts
- **src/**: React components and custom pages
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ We [just released](https://github.com/DataSQRL/sqrl/releases/tag/v0.1.0) the fir
Here are some ideas for how you can contribute:

* Share your thoughts: Do you have ideas on how we can improve the SQRL language or the DataSQRL compiler? Jump into [our community](/community) and let us know!
* Test the waters: Do you like playing with new technologies? Try out [DataSQRL](/docs/getting-started) and let us know if you find any bugs or missing features.
* Test the waters: Do you like playing with new technologies? Try out [DataSQRL](/docs/intro/getting-started) and let us know if you find any bugs or missing features.
* Spread the word: Think DataSQRL has potential? Share this blog post and [star](https://github.com/DataSQRL/sqrl) DataSQRL on [Github](https://github.com/DataSQRL/sqrl). Your support can help us reach more like-minded individuals.
* Code with us: Do you enjoy contributing to open-source projects? Dive into [the code](https://github.com/DataSQRL/sqrl) with us and pick up a [ticket](https://github.com/DataSQRL/sqrl/issues).

Expand Down
2 changes: 1 addition & 1 deletion documentation/blog/2023-07-10-temporal-join.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ Temporal joins help us avoid the pitfalls of time-alignment problems when joinin

And that’s why the temporal join is stream processing's secret superpower.

DataSQRL makes using temporal joins a breeze. With its simplified syntax and smart defaults, it's like having a personal tour guide leading you through the sometimes bewildering landscape of stream processing. Take a look at our [Getting Started](/docs/getting-started) to see a complete example of temporal joins in action or take a look at our [other tutorials](/docs/tutorials) for a step-by-step guide to stream processing including temporal joins.
DataSQRL makes using temporal joins a breeze. With its simplified syntax and smart defaults, it's like having a personal tour guide leading you through the sometimes bewildering landscape of stream processing. Take a look at our [Getting Started](/docs/intro/getting-started) to see a complete example of temporal joins in action or take a look at our [other tutorials](/docs/intro/tutorials) for a step-by-step guide to stream processing including temporal joins.

Happy data time-traveling, folks!

2 changes: 1 addition & 1 deletion documentation/blog/2025-05-09-flink-sql-extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,4 +95,4 @@ In addition to breaking out the sink configuration from the main script, the `EX

[FlinkSQL](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/sql/overview/) is phenomenal extension of the SQL ecosystem to stream processing. With DataSQRL, we are trying to make it easier to build end-to-end data pipelines and complete data applications with FlinkSQL.

Check out the [complete example](/docs/getting-started) which also covers testing, customization, and deployment. Or read the [documentation](/docs/sqrl-language) to learn more.
Check out the [complete example](/docs/intro/getting-started) which also covers testing, customization, and deployment. Or read the [documentation](/docs/sqrl-language) to learn more.
2 changes: 1 addition & 1 deletion documentation/blog/2025-07-27-datasqrl-0.7.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ docker pull datasqrl/cmd:0.7.0

Data delivery is the final and most visible stage of any data pipeline. It's how users, applications, and AI agents actually access and consume data. Most enterprise data interactions happen through APIs, making the delivery interface a critical component. At DataSQRL, we've invested heavily in automating the upstream parts of the pipeline: from Flink-powered data processing to Postgres-backed storage. With version 0.7, we turn our focus to the serving layer: introducing support for the Model Context Protocol (MCP) and REST APIs, as well as JWT-based authentication and authorization. These additions ensure seamless integration with most authentication providers and enable secure, token-based data access, with fine-grained authorization logic enforced directly in the SQRL script. This completes our vision of end-to-end pipeline automation, where consumption patterns inform data storage and processing—closing the loop between data production and usage.

Check out the [interface documentation](../docs/interface) for more information.
Check out the [interface documentation](/docs/interface) for more information.

<!--truncate-->

Expand Down
20 changes: 20 additions & 0 deletions documentation/blog/tags.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,23 @@ feature:
label: feature
permalink: /feature
description: Feature descriptions

DataSQRL:
label: DataSQRL
permalink: /datasqrl
description: Posts about DataSQRL

community:
label: Community
permalink: /community
description: Community updates and announcements

Join:
label: Join
permalink: /join
description: Posts about SQL joins and temporal joins

Flink:
label: Flink
permalink: /flink
description: Apache Flink related posts
2 changes: 1 addition & 1 deletion documentation/docs/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@ DataSQRL builds on top of Flink and the Flink connector ecosystem.
|----------|----------|------------------------------|----------------------|----------|----------------------------|------------------------|
| 0.6.x | 1.19.x | 1.19+ <br /> 1.19.1+2 tested | 13+ <br /> 14 tested | 0.8.0 | 1.9.0+ <br /> 1.9.0 tested | 3+ <br /> 3.4.0 tested |
| 0.7.x | 1.19.x | 1.19+ <br /> 1.19.2+3 tested | 15+ <br /> 17 tested | 0.8.0 | 1.9.0+ <br /> 1.9.0 tested | 3+ <br /> 3.4.0 tested |
| | | | | | | |
| 0.8.x | 1.19.x | 1.19+ <br /> 1.19.3 tested | 15+ <br /> 17 tested | 0.8.0 | 1.9.0+ <br /> 1.9.2 tested | 3+ <br /> 3.4.0 tested |
| | | | | | | |
| | | | | | | |
5 changes: 4 additions & 1 deletion documentation/docs/compiler.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,10 @@ The run command uses the following engines:
* Postgres as the transactional database engine
* Iceberg+DuckDB as the analytic database engine
* RedPanda as the log engine: The RedPanda cluster is accessible on port 9092 (via Kafka command line tooling).
* Vertx as the server engine: The GraphQL API is accessible at [http://localhost:8888/graphiql/](http://localhost:8888/graphiql/).
* Vertx as the server engine:
* The GraphQL API is accessible at [http://localhost:8888/v1/graphiql/](http://localhost:8888/v1/graphiql/).
* The Swagger UI for the REST API is accessible at [http://localhost:8888/v1/swagger-ui](http://localhost:8888/v1/swagger-ui)
* The MCP API is accessible at `http://localhost:8888/v1/mcp/`

### Data Access

Expand Down
64 changes: 64 additions & 0 deletions documentation/docs/configuration-default-update.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/bin/bash
#
# Copyright © 2021 DataSQRL ([email protected])
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#


# Script to update configuration-default.md with the latest default-package.json content
# This script replaces everything between ```json and ``` with the contents of default-package.json

set -e

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
MD_FILE="$SCRIPT_DIR/configuration-default.md"
JSON_FILE="$SCRIPT_DIR/../../sqrl-planner/src/main/resources/default-package.json"

# Check if files exist
if [ ! -f "$MD_FILE" ]; then
echo "Error: $MD_FILE not found"
exit 1
fi

if [ ! -f "$JSON_FILE" ]; then
echo "Error: $JSON_FILE not found"
exit 1
fi

echo "Updating $MD_FILE with contents from $JSON_FILE..."

# Create temporary files
TEMP_FILE=$(mktemp)
BEFORE_JSON=$(mktemp)
AFTER_JSON=$(mktemp)

# Extract the part before ```json
sed -n '1,/^```json$/p' "$MD_FILE" > "$BEFORE_JSON"

# Extract the part after the closing ```
sed -n '/^```$/,$p' "$MD_FILE" | tail -n +2 > "$AFTER_JSON"

# Combine: before + json content + after
cat "$BEFORE_JSON" > "$TEMP_FILE"
cat "$JSON_FILE" >> "$TEMP_FILE"
echo '```' >> "$TEMP_FILE"
cat "$AFTER_JSON" >> "$TEMP_FILE"

# Replace the original file
mv "$TEMP_FILE" "$MD_FILE"

# Clean up temporary files
rm -f "$BEFORE_JSON" "$AFTER_JSON"

echo "Successfully updated $MD_FILE with the latest default configuration"
122 changes: 122 additions & 0 deletions documentation/docs/configuration-default.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Default DataSQRL `package.json` Configuration

The following is the [default configuration file](https://raw.githubusercontent.com/DataSQRL/sqrl/refs/heads/main/sqrl-planner/src/main/resources/default-package.json) that user provided configuration files are merged on top of. It provides the default values for all configuration options.

```json
{
"version": "1",
"enabled-engines": ["vertx", "postgres", "kafka", "flink"],
"compiler": {
"logger": "print",
"extended-scalar-types": true,
"compile-flink-plan": true,
"cost-model": "DEFAULT",
"explain": {
"text": true,
"sql": false,
"logical": true,
"physical": false,
"sorted": true,
"visual": true
},
"api": {
"protocols": ["GRAPHQL", "REST", "MCP"],
"endpoints": "FULL",
"add-prefix": true,
"max-result-depth": 3,
"default-limit": 10
}
},
"engines": {
"flink": {
"config": {
"execution.runtime-mode": "STREAMING",
"execution.target": "local",
"execution.attached": true,
"rest.address": "localhost",
"rest.port": 8081,
"state.backend.type": "rocksdb",
"table.exec.resource.default-parallelism": 1,
"taskmanager.memory.network.max": "800m"
}
},
"duckdb": {
"url": "jdbc:duckdb:"
}
},
"connectors": {
"kafka-mutation": {
"connector": "kafka",
"format": "flexible-json",
"flexible-json.timestamp-format.standard": "ISO-8601",
"properties.bootstrap.servers": "${KAFKA_BOOTSTRAP_SERVERS}",
"properties.group.id": "${KAFKA_GROUP_ID}",
"properties.auto.offset.reset": "earliest",
"topic": "${sqrl:table-name}"
},
"kafka": {
"connector": "kafka",
"format": "flexible-json",
"flexible-json.timestamp-format.standard": "ISO-8601",
"properties.bootstrap.servers": "${KAFKA_BOOTSTRAP_SERVERS}",
"properties.group.id": "${KAFKA_GROUP_ID}",
"topic": "${sqrl:table-name}"
},
"iceberg": {
"connector": "iceberg",
"catalog-table": "${sqrl:table-name}",
"warehouse": "iceberg-data",
"catalog-type": "hadoop",
"catalog-name": "mycatalog"
},
"postgres": {
"connector": "jdbc-sqrl",
"username": "${POSTGRES_USERNAME}",
"password": "${POSTGRES_PASSWORD}",
"url": "jdbc:postgresql://${POSTGRES_AUTHORITY}",
"driver": "org.postgresql.Driver",
"table-name": "${sqrl:table-name}"
},
"print": {
"connector": "print",
"print-identifier": "${sqrl:table-name}"
}
},
"test-runner": {
"snapshot-folder": "./snapshots",
"test-folder": "./tests",
"delay-sec": 30,
"mutation-delay-sec": 0,
"required-checkpoints": 0
}
}
```

## Connector Template Variables

The connector templates configured under `connectors` can use environment variables and SQRL-specific variables for dynamic configuration.

### Environment Variables

You can reference environment variables using the `${VAR_NAME}` placeholder syntax, for example `${POSTGRES_PASSWORD}`.
At runtime, these placeholders are automatically resolved using the environment variables defined in the system or deployment environment.

This can help decouple security credentials or add flexibility across different deployment environments.

### SQRL Variables

SQRL-specific variables start with a `sqrl:` prefix and are used for templating inside connector configuration options.
The proper syntax look like `${sqrl:<identifier>}`.

Supported identifiers include:
- `table-name`
- `original-table-name`
- `filename`
- `format`
- `kafka-key`

These are typically used within connector templates to inject table-specific or context-aware configuration values.

:::warning
Unresolved `${sqrl:*}` placeholders raise a validation error.
:::
30 changes: 30 additions & 0 deletions documentation/docs/configuration-engine/duckdb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# DuckDB Engine Configuration

DuckDB is a vectorized database query engine that excels at analytical queries and can read Iceberg tables efficiently.

## Configuration Options

| Key | Type | Default | Description |
|-------|------------|------------------|---------------------------------------|
| `url` | **string** | `"jdbc:duckdb:"` | Full JDBC URL for database connection |

## Example Configuration

```json
{
"engines": {
"duckdb": {
"url": "jdbc:duckdb:"
}
}
}
```

## Usage Notes

- Ideal for local development and testing of analytical workloads
- Excellent performance on analytical queries with vectorized execution
- Can read Iceberg tables directly without additional infrastructure
- Supports both in-memory and persistent database modes
- Perfect for prototyping before deploying to cloud query engines like Snowflake
- Lightweight alternative to larger analytical databases
71 changes: 71 additions & 0 deletions documentation/docs/configuration-engine/flink.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Flink Engine Configuration

Apache Flink is a streaming and batch data processor that serves as the core data processing engine in DataSQRL pipelines.

## Configuration Options

| Key | Type | Default | Notes |
|--------------|------------|-----------|----------------------------------------------------------------------------------------------------|
| `config` | **object** | see below | Copied verbatim into the generated Flink SQL job (e.g. `"table.exec.source.idle-timeout": "5 s"`). |

Frequently configured options include:

* `execution.runtime-mode`: `BATCH` or `STREAMING`
* `table.exec.source.idle-timeout`: Timeout for idle sources so watermark can advance.
* `table.exec.mini-batch.*`: For more efficient execution in STREAMING mode by processing in small batches.

Refer to the [Flink Documentation](hhttps://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/config/) for all Flink configuration options.

## Example Configuration

```json
{
"engines": {
"flink": {
"config": {
"execution.runtime-mode": "STREAMING",
"rest.port": 8081,
"state.backend.type": "rocksdb",
"table.exec.resource.default-parallelism": 1,
"taskmanager.memory.network.max": "800m"
}
}
}
}
```

## Deployment Configuration

Flink supports deployment-specific configuration options for managing cluster resources:

| Key | Type | Default | Description |
|---------------------|-------------|---------|-----------------------------------------------------------------|
| `jobmanager-size` | **string** | - | Job manager instance size: `dev`, `small`, `medium`, `large` |
| `taskmanager-size` | **string** | - | Task manager instance size with resource variants |
| `taskmanager-count` | **integer** | - | Number of task manager instances (minimum: 1) |
| `secrets` | **array** | `null` | Array of secret names to inject, or `null` if no secrets needed |

### Task Manager Size Options

Available `taskmanager-size` options with resource variants:
- `dev` - Development/testing size
- `small`, `small.mem`, `small.cpu` - Small instances with memory or CPU optimization
- `medium`, `medium.mem`, `medium.cpu` - Medium instances with resource variants
- `large`, `large.mem`, `large.cpu` - Large instances with resource variants

### Deployment Example

```json
{
"engines": {
"flink": {
"deployment": {
"jobmanager-size": "small",
"taskmanager-size": "medium.mem",
"taskmanager-count": 2,
"secrets": ["flink-secrets", "db-credentials"]
}
}
}
}
```
Loading