[build] Introduce flink-quickstart docker file #1759

luoyuxia · 2025-09-25T12:46:36Z

Purpose

Linked issue: close #1111

Brief change log

Copied from https://github.com/luoyuxia/fluss-quickstart-flink/blob/main/sql-client/Dockerfile, but download jars from web and Fluss built in local

Tests

API and Format

Documentation

luoyuxia · 2025-09-25T12:47:56Z

cc @michaelkoepf @wuchong

Copilot

Pull Request Overview

This PR introduces a Docker setup for Fluss Quickstart with Flink integration, enabling users to quickly test Fluss with Flink in a containerized environment. The setup includes data generation using the faker connector and pre-configured SQL tables for testing.

Adds Docker infrastructure for Fluss-Flink quickstart environment
Creates sample SQL scripts with faker-generated test data for orders, customers, and nations
Provides automated build preparation script to collect necessary JARs

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

File	Description
docker/quickstart-flink/sql/sql-client.sql	Sample SQL script with faker connector table definitions for testing
docker/quickstart-flink/prepare_build.sh	Build preparation script that downloads dependencies and copies Fluss JARs
docker/quickstart-flink/README.md	Documentation for building and using the quickstart Docker image
docker/quickstart-flink/Dockerfile	Docker image definition based on Flink 1.20.0 with Fluss integration

Comments suppressed due to low confidence (2)

docker/quickstart-flink/prepare_build.sh:1

The wildcard pattern matching could fail silently if the JAR files don't exist or if multiple JARs match the pattern. Add error handling to check if the source files exist before copying and ensure only one file matches each pattern.

#!/bin/bash

docker/quickstart-flink/prepare_build.sh:1

The wget commands lack error handling. If any download fails, the script continues execution which could lead to a broken Docker image. Add error checking after each wget command or use wget with the --no-clobber and --continue flags for better reliability.

#!/bin/bash

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

docker/quickstart-flink/prepare_build.sh

michaelkoepf

@luoyuxia I left some comments.

One additional thing: Should we move the files for the Fluss image (docker-entrypoint.sh, Dockerfile) from the top-level of the docker folder to a separate fluss folder?

i.e., a directory structure like this

docker
|- fluss
|- quickstart-flink

docker/quickstart-flink/README.md

docker/quickstart-flink/prepare_build.sh

MehulBatra · 2025-10-01T18:31:01Z

docker/quickstart-flink/prepare_build.sh

+        "flink-shaded-hadoop-2-uber-2.8.3-10.0"
+
+    # Download paimon-flink connector
+    download_jar \


#1727 For our effort to create quick start guide for iceberg, we need to add the flink iceberg dependencies also for fluss/quickstart-flink image, what do you think @luoyuxia @wuchong ?
PS: we can skip the aws bundle i feel

# Download iceberg-flink-runtime for Flink 1.20 download_jar \ "https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.20/1.9.1/iceberg-flink-runtime-1.20-1.9.1.jar" \ "./lib/iceberg-flink-runtime-1.20-1.9.1.jar" \ "" \ "iceberg-flink-runtime-1.20-1.9.1" # Download iceberg-aws-bundle for S3/Glue support download_jar \ "https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/1.9.1/iceberg-aws-bundle-1.9.1.jar" \ "./lib/iceberg-aws-bundle-1.9.1.jar" \ "" \ "iceberg-aws-bundle-1.9.1"

Also while running tiering service for iceberg I found out we need avro jar also else we get the following error:

Caused by: java.lang.NoSuchMethodError: 'org.apache.avro.LogicalTypes$TimestampNanos org.apache.avro.LogicalTypes.timestampNanos()' at org.apache.iceberg.avro.TypeToSchema.<clinit>(TypeToSchema.java:50)

# Download Avro 1.12.0 download_jar \ "https://repo1.maven.org/maven2/org/apache/avro/avro/1.12.0/avro-1.12.0.jar" \ "./lib/avro-1.12.0.jar" \ "" \ "avro-1.12.0"

@MehulBatra should we also add this instruction to the iceberg integration documentation?

https://fluss.apache.org/docs/next/streaming-lakehouse/integrate-data-lakes/iceberg/

I suggest to do it in #1727. Let's just focus on quickstart-paimon in this pr. The reason is that:

What's jars to be add for iceberg quickstart should depend on what the iceberg quickstart looks like. For example, if no s3, we don't need to add iceberg-aws-bundle

There should be some class conflicts that should be resolved, for example, iceberg 1.9.0 has also drop the support for hadoop2 but the quickstart-paimon use hadoop2 jar.

Also while running tiering service for iceberg I found out we need avro jar also else we get the following error:

Caused by: java.lang.NoSuchMethodError: 'org.apache.avro.LogicalTypes$TimestampNanos org.apache.avro.LogicalTypes.timestampNanos()' at org.apache.iceberg.avro.TypeToSchema.<clinit>(TypeToSchema.java:50)

# Download Avro 1.12.0 download_jar \ "https://repo1.maven.org/maven2/org/apache/avro/avro/1.12.0/avro-1.12.0.jar" \ "./lib/avro-1.12.0.jar" \ "" \ "avro-1.12.0"

Yes, that's what I mean class conflict. We put hadooop2 related classes in FLINK_HOME/lib, but iceberg required hadooop3. And the document also said to use hadoop3. I already created a repo https://github.com/luoyuxia/fluss-iceberg/tree/main/flink/lib to verify fluss interating with iceberg. You can refer to here for what jars added.

Now, I append a commit to use hadoop3 and the quickstart-paimon still works. After upgrade to hadoop3, iceberg can also share this hadoop3 lib without the error:

Caused by: java.lang.NoSuchMethodError: 'org.apache.avro.LogicalTypes$TimestampNanos org.apache.avro.LogicalTypes.timestampNanos()' at org.apache.iceberg.avro.TypeToSchema.<clinit>(TypeToSchema.java:50)

@wuchong I have added the jars required specifically for the iceberg quickstart for a self hosted flink environment part!

@luoyuxia I was unaware for Adding hadoop 3 should resolve avro error without needing seperate avro jar, will incorporate that in quickstart!

docker/quickstart-flink/prepare_build.sh

luoyuxia · 2025-10-11T03:56:03Z

@wuchong Comments addressed

MehulBatra · 2025-10-13T12:12:09Z

@luoyuxia Looks good to me, Thanks for all the hard work and setting up the foundation!

[build] Introduce quickstart docker

14783fe

luoyuxia force-pushed the quick-start branch from 00e0555 to 14783fe Compare September 25, 2025 12:47

luoyuxia requested a review from Copilot September 25, 2025 13:55

Copilot AI reviewed Sep 25, 2025

View reviewed changes

docker/quickstart-flink/prepare_build.sh Outdated Show resolved Hide resolved

docker/quickstart-flink/prepare_build.sh Outdated Show resolved Hide resolved

docker/quickstart-flink/prepare_build.sh Outdated Show resolved Hide resolved

MehulBatra reviewed Sep 25, 2025

View reviewed changes

docker/quickstart-flink/prepare_build.sh Outdated Show resolved Hide resolved

add checksum

5a38369

michaelkoepf reviewed Sep 26, 2025

View reviewed changes

docker/quickstart-flink/README.md Outdated Show resolved Hide resolved

docker/quickstart-flink/README.md Outdated Show resolved Hide resolved

docker/quickstart-flink/prepare_build.sh Outdated Show resolved Hide resolved

docker/quickstart-flink/prepare_build.sh Outdated Show resolved Hide resolved

address comments

517f344

MehulBatra reviewed Oct 1, 2025

View reviewed changes

docker/quickstart-flink/prepare_build.sh Show resolved Hide resolved

MehulBatra reviewed Oct 1, 2025

View reviewed changes

docker/quickstart-flink/prepare_build.sh Show resolved Hide resolved

wuchong reviewed Oct 5, 2025

View reviewed changes

docker/quickstart-flink/prepare_build.sh Show resolved Hide resolved

wuchong reviewed Oct 5, 2025

View reviewed changes

docker/quickstart-flink/prepare_build.sh Outdated Show resolved Hide resolved

luoyuxia mentioned this pull request Oct 13, 2025

[Docs] Fluss/flink - Iceberg quickstart guide #1800

Merged

address comments

a906826

luoyuxia force-pushed the quick-start branch from c8a4af7 to a906826 Compare October 13, 2025 03:28

update to use hadoop3 to shared by iceberg

42134a9

MehulBatra merged commit be6fc3b into apache:main Oct 13, 2025
5 checks passed

beryllw mentioned this pull request Oct 14, 2025

Support HDFS as Remote Storage and Lake Warehouse for Quickstart #1786

Open

2 tasks

MehulBatra mentioned this pull request Oct 14, 2025

[build] Add Iceberg Jars in the quickstart docker image #1804

Open

2 tasks

[build] Introduce flink-quickstart docker file #1759

[build] Introduce flink-quickstart docker file #1759

Uh oh!

Conversation

luoyuxia commented Sep 25, 2025

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

luoyuxia commented Sep 25, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michaelkoepf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MehulBatra Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuchong Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

MehulBatra Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

wuchong Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

luoyuxia Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

luoyuxia Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

luoyuxia Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

MehulBatra Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

MehulBatra Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

luoyuxia commented Oct 11, 2025

Uh oh!

MehulBatra commented Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MehulBatra Oct 1, 2025 •

edited

Loading

MehulBatra Oct 13, 2025 •

edited

Loading