DataSQRL · ferenc-csaky · Oct 29, 2025 · Oct 6, 2025 · Oct 6, 2025 · Oct 6, 2025
diff --git a/.gitignore b/.gitignore
@@ -129,4 +129,7 @@ node_modules/
 
 npm-debug.log*
 yarn-debug.log*
-yarn-error.log*
+yarn-error.log*
+
+#Generated markdown files
+*-generated.md
diff --git a/documentation/CLAUDE.md b/documentation/CLAUDE.md
@@ -77,7 +77,8 @@ npm run swizzle
 
 ### Content Structure
 - **docs/**: Main documentation content in Markdown format
-  - Core documentation files include intro.md, getting-started.md, sqrl-language.md, etc.
+  - **intro/**: Introductory documentation (intro.md, getting-started.md, concepts.md, tutorials.md)
+  - Core documentation files include sqrl-language.md, interface.md, configuration.md, etc.
   - **stdlib-docs/**: Embedded function library documentation
 - **blog/**: Release notes, updates, and technical blog posts
 - **src/**: React components and custom pages

diff --git a/documentation/blog/2023-05-15-lets-uplevel-database-datasqrl.mdx b/documentation/blog/2023-05-15-lets-uplevel-database-datasqrl.mdx
@@ -76,7 +76,7 @@ We [just released](https://github.com/DataSQRL/sqrl/releases/tag/v0.1.0) the fir
 Here are some ideas for how you can contribute:
 
 * Share your thoughts: Do you have ideas on how we can improve the SQRL language or the DataSQRL compiler? Jump into [our community](/community) and let us know!
-* Test the waters: Do you like playing with new technologies? Try out [DataSQRL](/docs/getting-started) and let us know if you find any bugs or missing features.
+* Test the waters: Do you like playing with new technologies? Try out [DataSQRL](/docs/intro/getting-started) and let us know if you find any bugs or missing features.
 * Spread the word: Think DataSQRL has potential? Share this blog post and [star](https://github.com/DataSQRL/sqrl) DataSQRL on [Github](https://github.com/DataSQRL/sqrl). Your support can help us reach more like-minded individuals.
 * Code with us: Do you enjoy contributing to open-source projects? Dive into [the code](https://github.com/DataSQRL/sqrl) with us and pick up a [ticket](https://github.com/DataSQRL/sqrl/issues).
 

diff --git a/documentation/blog/2023-07-10-temporal-join.mdx b/documentation/blog/2023-07-10-temporal-join.mdx
@@ -135,7 +135,7 @@ Temporal joins help us avoid the pitfalls of time-alignment problems when joinin
 
 And that’s why the temporal join is stream processing's secret superpower.
 
-DataSQRL makes using temporal joins a breeze. With its simplified syntax and smart defaults, it's like having a personal tour guide leading you through the sometimes bewildering landscape of stream processing. Take a look at our [Getting Started](/docs/getting-started) to see a complete example of temporal joins in action or take a look at our [other tutorials](/docs/tutorials) for a step-by-step guide to stream processing including temporal joins.
+DataSQRL makes using temporal joins a breeze. With its simplified syntax and smart defaults, it's like having a personal tour guide leading you through the sometimes bewildering landscape of stream processing. Take a look at our [Getting Started](/docs/intro/getting-started) to see a complete example of temporal joins in action or take a look at our [other tutorials](/docs/intro/tutorials) for a step-by-step guide to stream processing including temporal joins.
 
 Happy data time-traveling, folks!
 
diff --git a/documentation/blog/2025-05-09-flink-sql-extensions.md b/documentation/blog/2025-05-09-flink-sql-extensions.md
@@ -95,4 +95,4 @@ In addition to breaking out the sink configuration from the main script, the `EX
 
 [FlinkSQL](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/sql/overview/) is phenomenal extension of the SQL ecosystem to stream processing. With DataSQRL, we are trying to make it easier to build end-to-end data pipelines and complete data applications with FlinkSQL.
 
-Check out the [complete example](/docs/getting-started) which also covers testing, customization, and deployment. Or read the [documentation](/docs/sqrl-language) to learn more.
+Check out the [complete example](/docs/intro/getting-started) which also covers testing, customization, and deployment. Or read the [documentation](/docs/sqrl-language) to learn more.
diff --git a/documentation/blog/2025-07-27-datasqrl-0.7.md b/documentation/blog/2025-07-27-datasqrl-0.7.md
@@ -33,7 +33,7 @@ docker pull datasqrl/cmd:0.7.0
 
 Data delivery is the final and most visible stage of any data pipeline. It's how users, applications, and AI agents actually access and consume data. Most enterprise data interactions happen through APIs, making the delivery interface a critical component. At DataSQRL, we've invested heavily in automating the upstream parts of the pipeline: from Flink-powered data processing to Postgres-backed storage. With version 0.7, we turn our focus to the serving layer: introducing support for the Model Context Protocol (MCP) and REST APIs, as well as JWT-based authentication and authorization. These additions ensure seamless integration with most authentication providers and enable secure, token-based data access, with fine-grained authorization logic enforced directly in the SQRL script. This completes our vision of end-to-end pipeline automation, where consumption patterns inform data storage and processing—closing the loop between data production and usage.
 
-Check out the [interface documentation](../docs/interface) for more information.
+Check out the [interface documentation](/docs/interface) for more information.
 
 <!--truncate-->
 

diff --git a/documentation/blog/tags.yml b/documentation/blog/tags.yml
@@ -28,3 +28,23 @@ feature:
   label: feature
   permalink: /feature
   description: Feature descriptions
+
+DataSQRL:
+  label: DataSQRL
+  permalink: /datasqrl
+  description: Posts about DataSQRL
+
+community:
+  label: Community
+  permalink: /community
+  description: Community updates and announcements
+
+Join:
+  label: Join
+  permalink: /join
+  description: Posts about SQL joins and temporal joins
+
+Flink:
+  label: Flink
+  permalink: /flink
+  description: Apache Flink related posts
diff --git a/documentation/docs/compatibility.md b/documentation/docs/compatibility.md
@@ -6,6 +6,6 @@ DataSQRL builds on top of Flink and the Flink connector ecosystem.
 |----------|----------|------------------------------|----------------------|----------|----------------------------|------------------------|
 | 0.6.x    | 1.19.x   | 1.19+ <br /> 1.19.1+2 tested | 13+ <br /> 14 tested | 0.8.0    | 1.9.0+ <br /> 1.9.0 tested | 3+ <br /> 3.4.0 tested |
 | 0.7.x    | 1.19.x   | 1.19+ <br /> 1.19.2+3 tested | 15+ <br /> 17 tested | 0.8.0    | 1.9.0+ <br /> 1.9.0 tested | 3+ <br /> 3.4.0 tested |
-|          |          |                              |                      |          |                            |                        |
+| 0.8.x    | 1.19.x   | 1.19+ <br /> 1.19.3 tested   | 15+ <br /> 17 tested | 0.8.0    | 1.9.0+ <br /> 1.9.2 tested | 3+ <br /> 3.4.0 tested |
 |          |          |                              |                      |          |                            |                        |
 |          |          |                              |                      |          |                            |                        |
diff --git a/documentation/docs/compiler.md b/documentation/docs/compiler.md
@@ -73,7 +73,10 @@ The run command uses the following engines:
 * Postgres as the transactional database engine
 * Iceberg+DuckDB as the analytic database engine
 * RedPanda as the log engine: The RedPanda cluster is accessible on port 9092 (via Kafka command line tooling).
-* Vertx as the server engine: The GraphQL API is accessible at [http://localhost:8888/graphiql/](http://localhost:8888/graphiql/).
+* Vertx as the server engine: 
+  * The GraphQL API is accessible at [http://localhost:8888/v1/graphiql/](http://localhost:8888/v1/graphiql/).
+  * The Swagger UI for the REST API is accessible at [http://localhost:8888/v1/swagger-ui](http://localhost:8888/v1/swagger-ui)
+  * The MCP API is accessible at `http://localhost:8888/v1/mcp/`
 
 ### Data Access
 

diff --git a/documentation/docs/configuration-default-update.sh b/documentation/docs/configuration-default-update.sh
@@ -0,0 +1,64 @@
+#!/bin/bash
+#
+# Copyright © 2021 DataSQRL ([email protected])
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+
+# Script to update configuration-default.md with the latest default-package.json content
+# This script replaces everything between ```json and ``` with the contents of default-package.json
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+MD_FILE="$SCRIPT_DIR/configuration-default.md"
+JSON_FILE="$SCRIPT_DIR/../../sqrl-planner/src/main/resources/default-package.json"
+
+# Check if files exist
+if [ ! -f "$MD_FILE" ]; then
+    echo "Error: $MD_FILE not found"
+    exit 1
+fi
+
+if [ ! -f "$JSON_FILE" ]; then
+    echo "Error: $JSON_FILE not found"
+    exit 1
+fi
+
+echo "Updating $MD_FILE with contents from $JSON_FILE..."
+
+# Create temporary files
+TEMP_FILE=$(mktemp)
+BEFORE_JSON=$(mktemp)
+AFTER_JSON=$(mktemp)
+
+# Extract the part before ```json
+sed -n '1,/^```json$/p' "$MD_FILE" > "$BEFORE_JSON"
+
+# Extract the part after the closing ```
+sed -n '/^```$/,$p' "$MD_FILE" | tail -n +2 > "$AFTER_JSON"
+
+# Combine: before + json content + after
+cat "$BEFORE_JSON" > "$TEMP_FILE"
+cat "$JSON_FILE" >> "$TEMP_FILE"
+echo '```' >> "$TEMP_FILE"
+cat "$AFTER_JSON" >> "$TEMP_FILE"
+
+# Replace the original file
+mv "$TEMP_FILE" "$MD_FILE"
+
+# Clean up temporary files
+rm -f "$BEFORE_JSON" "$AFTER_JSON"
+
+echo "Successfully updated $MD_FILE with the latest default configuration"
diff --git a/documentation/docs/configuration-default.md b/documentation/docs/configuration-default.md
@@ -0,0 +1,122 @@
+# Default DataSQRL `package.json` Configuration
+
+The following is the [default configuration file](https://raw.githubusercontent.com/DataSQRL/sqrl/refs/heads/main/sqrl-planner/src/main/resources/default-package.json) that user provided configuration files are merged on top of. It provides the default values for all configuration options.
+
+```json
+{
+  "version": "1",
+  "enabled-engines": ["vertx", "postgres", "kafka", "flink"],
+  "compiler": {
+    "logger": "print",
+    "extended-scalar-types": true,
+    "compile-flink-plan": true,
+    "cost-model": "DEFAULT",
+    "explain": {
+      "text": true,
+      "sql": false,
+      "logical": true,
+      "physical": false,
+      "sorted": true,
+      "visual": true
+    },
+    "api": {
+      "protocols": ["GRAPHQL", "REST", "MCP"],
+      "endpoints": "FULL",
+      "add-prefix": true,
+      "max-result-depth": 3,
+      "default-limit": 10
+    }
+  },
+  "engines": {
+    "flink": {
+      "config": {
+        "execution.runtime-mode": "STREAMING",
+        "execution.target": "local",
+        "execution.attached": true,
+        "rest.address": "localhost",
+        "rest.port": 8081,
+        "state.backend.type": "rocksdb",
+        "table.exec.resource.default-parallelism": 1,
+        "taskmanager.memory.network.max": "800m"
+      }
+    },
+    "duckdb": {
+      "url": "jdbc:duckdb:"
+    }
+  },
+  "connectors": {
+    "kafka-mutation": {
+      "connector": "kafka",
+      "format": "flexible-json",
+      "flexible-json.timestamp-format.standard": "ISO-8601",
+      "properties.bootstrap.servers": "${KAFKA_BOOTSTRAP_SERVERS}",
+      "properties.group.id": "${KAFKA_GROUP_ID}",
+      "properties.auto.offset.reset": "earliest",
+      "topic": "${sqrl:table-name}"
+    },
+    "kafka": {
+      "connector": "kafka",
+      "format": "flexible-json",
+      "flexible-json.timestamp-format.standard": "ISO-8601",
+      "properties.bootstrap.servers": "${KAFKA_BOOTSTRAP_SERVERS}",
+      "properties.group.id": "${KAFKA_GROUP_ID}",
+      "topic": "${sqrl:table-name}"
+    },
+    "iceberg": {
+      "connector": "iceberg",
+      "catalog-table": "${sqrl:table-name}",
+      "warehouse": "iceberg-data",
+      "catalog-type": "hadoop",
+      "catalog-name": "mycatalog"
+    },
+    "postgres": {
+      "connector": "jdbc-sqrl",
+      "username": "${POSTGRES_USERNAME}",
+      "password": "${POSTGRES_PASSWORD}",
+      "url": "jdbc:postgresql://${POSTGRES_AUTHORITY}",
+      "driver": "org.postgresql.Driver",
+      "table-name": "${sqrl:table-name}"
+    },
+    "print": {
+      "connector": "print",
+      "print-identifier": "${sqrl:table-name}"
+    }
+  },
+  "test-runner": {
+    "snapshot-folder": "./snapshots",
+    "test-folder": "./tests",
+    "delay-sec": 30,
+    "mutation-delay-sec": 0,
+    "required-checkpoints": 0
+  }
+}
+```
+
+## Connector Template Variables
+
+The connector templates configured under `connectors` can use environment variables and SQRL-specific variables for dynamic configuration.
+
+### Environment Variables
+
+You can reference environment variables using the `${VAR_NAME}` placeholder syntax, for example `${POSTGRES_PASSWORD}`.
+At runtime, these placeholders are automatically resolved using the environment variables defined in the system or deployment environment.
+
+This can help decouple security credentials or add flexibility across different deployment environments.
+
+### SQRL Variables
+
+SQRL-specific variables start with a `sqrl:` prefix and are used for templating inside connector configuration options.
+The proper syntax look like `${sqrl:<identifier>}`.
+
+Supported identifiers include:
+- `table-name`
+- `original-table-name`
+- `filename`
+- `format`
+- `kafka-key`
+
+These are typically used within connector templates to inject table-specific or context-aware configuration values.
+
+:::warning
+Unresolved `${sqrl:*}` placeholders raise a validation error.
+:::
diff --git a/documentation/docs/configuration-engine/duckdb.md b/documentation/docs/configuration-engine/duckdb.md
@@ -0,0 +1,30 @@
+# DuckDB Engine Configuration
+
+DuckDB is a vectorized database query engine that excels at analytical queries and can read Iceberg tables efficiently.
+
+## Configuration Options
+
+| Key   | Type       | Default          | Description                           |
+|-------|------------|------------------|---------------------------------------|
+| `url` | **string** | `"jdbc:duckdb:"` | Full JDBC URL for database connection |
+
+## Example Configuration
+
+```json
+{
+  "engines": {
+    "duckdb": {
+      "url": "jdbc:duckdb:"
+    }
+  }
+}
+```
+
+## Usage Notes
+
+- Ideal for local development and testing of analytical workloads
+- Excellent performance on analytical queries with vectorized execution
+- Can read Iceberg tables directly without additional infrastructure
+- Supports both in-memory and persistent database modes
+- Perfect for prototyping before deploying to cloud query engines like Snowflake
+- Lightweight alternative to larger analytical databases
diff --git a/documentation/docs/configuration-engine/flink.md b/documentation/docs/configuration-engine/flink.md
@@ -0,0 +1,71 @@
+# Flink Engine Configuration
+
+Apache Flink is a streaming and batch data processor that serves as the core data processing engine in DataSQRL pipelines.
+
+## Configuration Options
+
+| Key          | Type       | Default   | Notes                                                                                              |
+|--------------|------------|-----------|----------------------------------------------------------------------------------------------------| 
+| `config`     | **object** | see below | Copied verbatim into the generated Flink SQL job (e.g. `"table.exec.source.idle-timeout": "5 s"`). |
+
+Frequently configured options include:
+
+* `execution.runtime-mode`: `BATCH` or `STREAMING`
+* `table.exec.source.idle-timeout`: Timeout for idle sources so watermark can advance.
+* `table.exec.mini-batch.*`: For more efficient execution in STREAMING mode by processing in small batches.
+
+Refer to the [Flink Documentation](hhttps://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/config/) for all Flink configuration options.
+
+## Example Configuration
+
+```json
+{
+  "engines": {
+    "flink": {
+      "config": {
+        "execution.runtime-mode": "STREAMING",
+        "rest.port": 8081,
+        "state.backend.type": "rocksdb",
+        "table.exec.resource.default-parallelism": 1,
+        "taskmanager.memory.network.max": "800m"
+      }
+    }
+  }
+}
+```
+
+## Deployment Configuration
+
+Flink supports deployment-specific configuration options for managing cluster resources:
+
+| Key                 | Type        | Default | Description                                                     |
+|---------------------|-------------|---------|-----------------------------------------------------------------|
+| `jobmanager-size`   | **string**  | -       | Job manager instance size: `dev`, `small`, `medium`, `large`    |
+| `taskmanager-size`  | **string**  | -       | Task manager instance size with resource variants               |
+| `taskmanager-count` | **integer** | -       | Number of task manager instances (minimum: 1)                   |
+| `secrets`           | **array**   | `null`  | Array of secret names to inject, or `null` if no secrets needed |
+
+### Task Manager Size Options
+
+Available `taskmanager-size` options with resource variants:
+- `dev` - Development/testing size
+- `small`, `small.mem`, `small.cpu` - Small instances with memory or CPU optimization
+- `medium`, `medium.mem`, `medium.cpu` - Medium instances with resource variants  
+- `large`, `large.mem`, `large.cpu` - Large instances with resource variants
+
+### Deployment Example
+
+```json
+{
+  "engines": {
+    "flink": {
+      "deployment": {
+        "jobmanager-size": "small",
+        "taskmanager-size": "medium.mem", 
+        "taskmanager-count": 2,
+        "secrets": ["flink-secrets", "db-credentials"]
+      }
+    }
+  }
+}
+```
Original file line number	Diff line number	Diff line change
Expand Up		@@ -95,4 +95,4 @@ In addition to breaking out the sink configuration from the main script, the `EX

		[FlinkSQL](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/sql/overview/) is phenomenal extension of the SQL ecosystem to stream processing. With DataSQRL, we are trying to make it easier to build end-to-end data pipelines and complete data applications with FlinkSQL.

		Check out the [complete example](/docs/getting-started) which also covers testing, customization, and deployment. Or read the [documentation](/docs/sqrl-language) to learn more.
		Check out the [complete example](/docs/intro/getting-started) which also covers testing, customization, and deployment. Or read the [documentation](/docs/sqrl-language) to learn more.