Skip to content

Data sources docs #1352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions docs/StardustDocs/d.tree
Original file line number Diff line number Diff line change
Expand Up @@ -187,8 +187,23 @@
<toc-element topic="jupyterRendering.md"/>
</toc-element>
</toc-element>
<toc-element topic="Data-Sources.md" hidden="true">
<toc-element topic="Integrations.md" hidden="true"/>
<toc-element topic="Data-Sources.md">
<toc-element topic="JSON.md">
<toc-element topic="OpenAPI.md"/>
</toc-element>
<toc-element topic="CSV-TSV.md"/>
<toc-element topic="Excel.md"/>
<toc-element topic="ApacheArrow.md"/>
<toc-element topic="SQL.md">
<toc-element topic="PostgreSQL.md"/>
<toc-element topic="MySQL.md"/>
<toc-element topic="Microsoft-SQL-Server.md"/>
<toc-element topic="SQLite.md"/>
<toc-element topic="H2.md"/>
<toc-element topic="MariaDB.md"/>
<toc-element topic="Custom-SQL-Source.md"/>
</toc-element>
<toc-element topic="Integrations.md"/>
</toc-element>
<toc-element topic="_shadow_resources.md" hidden="true"/>
<toc-element topic="Support.md"/>
Expand Down
1 change: 1 addition & 0 deletions docs/StardustDocs/topics/Home.topic
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
<title>Featured topics</title>
<a href="Kotlin-DataFrame-Features-in-Kotlin-Notebook.md"/>
<a href="Compiler-Plugin.md"/>
<a href="Data-Sources.md"/>
<a href="readSqlDatabases.md"/>
</secondary>

Expand Down
49 changes: 49 additions & 0 deletions docs/StardustDocs/topics/dataSources/ApacheArrow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Apache Arrow

<web-summary>
Read and write Apache Arrow files in Kotlin — efficient binary format support with Kotlin DataFrame.
</web-summary>

<card-summary>
Work with Arrow files in Kotlin for fast I/O — supports both streaming and random access formats.
</card-summary>

<link-summary>
Kotlin DataFrame provides full support for reading and writing Apache Arrow files in high-performance workflows.
</link-summary>


Kotlin DataFrame supports reading from and writing to Apache Arrow files.

Requires the [`dataframe-arrow` module](Modules.md#dataframe-arrow), which is included by
default in the general [`dataframe`](Modules.md#dataframe-general) artifact
and in [`%use dataframe`](gettingStartedKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.

> Make sure to follow the
> [Apache Arrow Java compatibility guide](https://arrow.apache.org/docs/java/install.html#java-compatibility)
> when using Java 9+.
> {style="warning"}

## Read

[`DataFrame`](DataFrame.md) supports both the
[Arrow interprocess streaming format](https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-streaming-format)
and the [Arrow random access format](https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-random-access-files).

You can read a `DataFrame` from Apache Arrow data sources
(via a file path, URL, or stream) using the [`readArrowFeather()`](read.md#read-apache-arrow-formats) method:

```kotlin
val df = DataFrame.readArrowFeather("example.feather")
```

```kotlin
val df = DataFrame.readArrowFeather("https://kotlin.github.io/dataframe/resources/example.feather")
```

## Write

A [`DataFrame`](DataFrame.md) can be written to Arrow format using the interprocess streaming or random access format.
Output targets include `WritableByteChannel`, `OutputStream`, `File`, or `ByteArray`.

See [](write.md#writing-to-apache-arrow-formats) for more details.
52 changes: 52 additions & 0 deletions docs/StardustDocs/topics/dataSources/CSV-TSV.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# CSV / TSV

<web-summary>
Work with CSV and TSV files — read, analyze, and export tabular data using Kotlin DataFrame.
</web-summary>

<card-summary>
Seamlessly load and write CSV or TSV files in Kotlin — perfect for common tabular data workflows.
</card-summary>

<link-summary>
Kotlin DataFrame support for reading and writing CSV and TSV files with simple, type-safe APIs.
</link-summary>


Kotlin DataFrame supports reading from and writing to CSV and TSV files.

Requires the [`dataframe-csv` module](Modules.md#dataframe-csv),
which is included by default in the general [`dataframe`](Modules.md#dataframe-general)
artifact and in [`%use dataframe`](gettingStartedKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.

## Read

You can read a [`DataFrame`](DataFrame.md) from a CSV or TSV file (via a file path or URL)
using the [`readCsv()`](read.md#read-from-csv) or `readTsv()` methods:

```kotlin
val df = DataFrame.readCsv("example.csv")
```

```kotlin
val df = DataFrame.readCsv("https://kotlin.github.io/dataframe/resources/example.csv")
```

## Write

You can write a [`DataFrame`](DataFrame.md) to a CSV file using the [`writeCsv()`](write.md#writing-to-csv) method:

```kotlin
df.writeCsv("example.csv")
```

## Deephaven CSV

The [`dataframe-csv`](Modules.md#dataframe-csv) module uses the high-performance
[Deephaven CSV library](https://github.com/deephaven/deephaven-csv) under the hood
for fast and efficient CSV reading and writing.

If you're working with large CSV files, you can adjust the parser manually
by [configuring Deephaven-specific parameters](https://kotlin.github.io/dataframe/read.html#unlocking-deephaven-csv-features)
to get the best performance for your use case.

32 changes: 31 additions & 1 deletion docs/StardustDocs/topics/dataSources/Data-Sources.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,33 @@
# Data Sources

> This topic is not ready yet.
<web-summary>
Discover all the data formats Kotlin DataFrame can work with — including JSON, CSV, Excel, SQL databases, and more.
</web-summary>

<card-summary>
Explore supported data sources in Kotlin DataFrame and how to integrate them into your data processing workflow.
</card-summary>

<link-summary>
Explore supported data sources in Kotlin DataFrame and how to integrate them into your data processing workflow.
</link-summary>

One of the key aspects of working with data is being able to read from and write to various data sources.
Kotlin DataFrame provides seamless support for a wide range of formats to integrate into your data workflows.
Below you'll find a list of supported sources along with instructions on how to read and write data using them.

- [JSON](JSON.md)
- [OpenAPI](OpenAPI.md)
- [CSV / TSV](CSV-TSV.md)
- [Excel](Excel.md)
- [Apache Arrow](ApacheArrow.md)
- [SQL](SQL.md):
- [PostgreSQL](PostgreSQL.md)
- [MySQL](MySQL.md)
- [Microsoft SQL Server](Microsoft-SQL-Server.md)
- [SQLite](SQLite.md)
- [H2](H2.md)
- [MariaDB](MariaDB.md)
- [Custom SQL Source](Custom-SQL-Source.md)
- [Custom integrations with unsupported data sources](Integrations.md)

42 changes: 42 additions & 0 deletions docs/StardustDocs/topics/dataSources/Excel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Excel

<web-summary>
Read from and write to Excel files in `.xls` or `.xlsx` formats with Kotlin DataFrame for seamless spreadsheet integration.
</web-summary>

<card-summary>
Kotlin DataFrame makes it easy to load and save data from Excel files — perfect for working with spreadsheet-based workflows.
</card-summary>

<link-summary>
Learn how to read and write Excel files using Kotlin DataFrame with just a single line of code.
</link-summary>


Kotlin DataFrame supports reading from and writing to Excel files in both `.xls` and `.xlsx` formats.

Requires the [`dataframe-excel` module](Modules.md#dataframe-excel),
which is included by default in the general [`dataframe`](Modules.md#dataframe-general)
artifact and in [`%use dataframe`](gettingStartedKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.

## Read

You can read a [`DataFrame`](DataFrame.md) from an Excel file (via a file path or URL)
using the [`readExcel()`](read.md#read-from-excel) method:

```kotlin
val df = DataFrame.readExcel("example.xlsx")
```

```kotlin
val df = DataFrame.readExcel("https://kotlin.github.io/dataframe/resources/example.xlsx")
```

## Write

You can write a [`DataFrame`](DataFrame.md) to an Excel file using the
[`writeExcel()`](write.md#writing-to-csv) method:

```kotlin
df.writeExcel("example.xlsx")
```
28 changes: 26 additions & 2 deletions docs/StardustDocs/topics/dataSources/Integrations.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
# Integrations
# Custom integrations with unsupported data sources

> This topic is not ready yet.
<web-summary>
Examples of how to integrate Kotlin DataFrame with other data frameworks like Exposed, Spark, or Multik.
</web-summary>

<card-summary>
Integrate Kotlin DataFrame with unsupported sources — see practical examples with Exposed, Spark, and more.
</card-summary>

<link-summary>
How to connect Kotlin DataFrame with data sources like Exposed, Apache Spark, or Multik.
</link-summary>

Some data sources are not officially supported in the Kotlin DataFrame API yet —
but you can still integrate them easily using custom code.

Below is a list of example integrations with other data frameworks.
These examples demonstrate how to bridge Kotlin DataFrame with external libraries or APIs.

- [Kotlin Exposed](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/exposed)
- [Apache Spark](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/spark)
- [Apache Spark (with Kotlin Spark API)](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/kotlinSpark)
- [Multik](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/multik)

You can use these examples as templates to create your own integrations
with any data processing library that produces structured tabular data.
47 changes: 47 additions & 0 deletions docs/StardustDocs/topics/dataSources/JSON.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# JSON

<web-summary>
Support for working with JSON data — load, explore, and save structured JSON using Kotlin DataFrame.
</web-summary>

<card-summary>
Easily handle JSON data in Kotlin — read from files or URLs, and export your data back to JSON format.
</card-summary>

<link-summary>
Kotlin DataFrame support for reading and writing JSON files in a structured and type-safe way.
</link-summary>

Kotlin DataFrame supports reading from and writing to JSON files.

Requires the [`dataframe-json` module](Modules.md#dataframe-json),
which is included by default in the general [`dataframe`](Modules.md#dataframe-general)
artifact and in [`%use dataframe`](gettingStartedKotlinNotebook.md#integrate-kotlin-dataframe)
for Kotlin Notebook.

> Kotlin DataFrame is suitable only for working with table-like structured JSON —
> a list of objects where each object represents a row and all objects share the same structure.
>
> Experimental support for [OpenAPI JSON schemas](OpenAPI.md) is also available.
> {style="note"}

## Read

You can read a [`DataFrame`](DataFrame.md) or [`DataRow`](DataRow.md)
from a JSON file (via a file path or URL) using the [`readJson()`](read.md#read-from-json) method:

```kotlin
val df = DataFrame.readJson("example.json")
```

```kotlin
val df = DataFrame.readJson("https://kotlin.github.io/dataframe/resources/example.json")
```

## Write

You can write a [`DataFrame`](DataFrame.md) to a JSON file using the [`writeJson()`](write.md#writing-to-json) method:

```kotlin
df.writeJson("example.json")
```
34 changes: 34 additions & 0 deletions docs/StardustDocs/topics/dataSources/OpenAPI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# OpenAPI

<web-summary>
Work with JSON data based on OpenAPI 3.0 schemas using Kotlin DataFrame — helpful for consuming structured API responses.
</web-summary>

<card-summary>
Use Kotlin DataFrame to read and write data that conforms to OpenAPI specifications. Great for API-driven data workflows.
</card-summary>

<link-summary>
Learn how to use OpenAPI 3.0 JSON schemas with Kotlin DataFrame to load and manipulate API-defined data.
</link-summary>


> **Experimental**: Support for OpenAPI 3.0.0 schemas is currently experimental
> and may change or be removed in future releases.
> {style="warning"}

Kotlin DataFrame provides support for reading and writing JSON data
that conforms to [OpenAPI 3.0 specifications](https://www.openapis.org).
This feature is useful when working with APIs that expose structured data defined via OpenAPI schemas.

Requires the [`dataframe-openapi` module](Modules.md#dataframe-openapi),
which **is not included** in the general [`dataframe`](Modules.md#dataframe-general) artifact.

To enable it in Kotlin Notebook, use:

```kotlin
%use dataframe(enableExperimentalOpenApi=true)
```

See [the OpenAPI guide notebook](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/json/KeyValueAndOpenApi.ipynb)
for details on how to work with OpenAPI-based data.
22 changes: 22 additions & 0 deletions docs/StardustDocs/topics/dataSources/sql/Custom-SQL-Source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Custom SQL Source

<web-summary>
Connect Kotlin DataFrame to any JDBC-compatible database using a custom SQL source configuration.
</web-summary>

<card-summary>
Easily integrate unsupported SQL databases in Kotlin DataFrame using a flexible custom source setup.
</card-summary>

<link-summary>
Define a custom SQL source in Kotlin DataFrame to work with any JDBC-based database.
</link-summary>


If your SQL database is not officially supported, you can either
[create an issue](https://github.com/Kotlin/dataframe/issues)
or define a simple, configurable custom SQL source.

See the [How to Extend DataFrame Library for Custom SQL Database Support guide](readSqlFromCustomDatabase.md)
for detailed instructions and an example with HSQLDB.

Loading