Skip to content

Conversation

luoyuxia
Copy link
Contributor

Purpose

Linked issue: close #1111

Brief change log

Copied from https://github.com/luoyuxia/fluss-quickstart-flink/blob/main/sql-client/Dockerfile, but download jars from web and Fluss built in local

Tests

API and Format

Documentation

@luoyuxia
Copy link
Contributor Author

cc @michaelkoepf @wuchong

@luoyuxia luoyuxia requested a review from Copilot September 25, 2025 13:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a Docker setup for Fluss Quickstart with Flink integration, enabling users to quickly test Fluss with Flink in a containerized environment. The setup includes data generation using the faker connector and pre-configured SQL tables for testing.

  • Adds Docker infrastructure for Fluss-Flink quickstart environment
  • Creates sample SQL scripts with faker-generated test data for orders, customers, and nations
  • Provides automated build preparation script to collect necessary JARs

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

File Description
docker/quickstart-flink/sql/sql-client.sql Sample SQL script with faker connector table definitions for testing
docker/quickstart-flink/prepare_build.sh Build preparation script that downloads dependencies and copies Fluss JARs
docker/quickstart-flink/README.md Documentation for building and using the quickstart Docker image
docker/quickstart-flink/Dockerfile Docker image definition based on Flink 1.20.0 with Fluss integration
Comments suppressed due to low confidence (2)

docker/quickstart-flink/prepare_build.sh:1

  • The wildcard pattern matching could fail silently if the JAR files don't exist or if multiple JARs match the pattern. Add error handling to check if the source files exist before copying and ensure only one file matches each pattern.
#!/bin/bash

docker/quickstart-flink/prepare_build.sh:1

  • The wget commands lack error handling. If any download fails, the script continues execution which could lead to a broken Docker image. Add error checking after each wget command or use wget with the --no-clobber and --continue flags for better reliability.
#!/bin/bash

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@michaelkoepf michaelkoepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luoyuxia I left some comments.

One additional thing: Should we move the files for the Fluss image (docker-entrypoint.sh, Dockerfile) from the top-level of the docker folder to a separate fluss folder?

i.e., a directory structure like this

docker
|- fluss
|- quickstart-flink

"flink-shaded-hadoop-2-uber-2.8.3-10.0"

# Download paimon-flink connector
download_jar \
Copy link
Contributor

@MehulBatra MehulBatra Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1727 For our effort to create quick start guide for iceberg, we need to add the flink iceberg dependencies also for fluss/quickstart-flink image, what do you think @luoyuxia @wuchong ?
PS: we can skip the aws bundle i feel

# Download iceberg-flink-runtime for Flink 1.20
download_jar \
    "https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.20/1.9.1/iceberg-flink-runtime-1.20-1.9.1.jar" \
    "./lib/iceberg-flink-runtime-1.20-1.9.1.jar" \
    "" \
    "iceberg-flink-runtime-1.20-1.9.1"

# Download iceberg-aws-bundle for S3/Glue support
download_jar \
    "https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/1.9.1/iceberg-aws-bundle-1.9.1.jar" \
    "./lib/iceberg-aws-bundle-1.9.1.jar" \
    "" \
    "iceberg-aws-bundle-1.9.1"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also while running tiering service for iceberg I found out we need avro jar also else we get the following error:

Caused by: java.lang.NoSuchMethodError: 'org.apache.avro.LogicalTypes$TimestampNanos org.apache.avro.LogicalTypes.timestampNanos()'
    at org.apache.iceberg.avro.TypeToSchema.<clinit>(TypeToSchema.java:50)
    # Download Avro 1.12.0
    download_jar \
        "https://repo1.maven.org/maven2/org/apache/avro/avro/1.12.0/avro-1.12.0.jar" \
        "./lib/avro-1.12.0.jar" \
        "" \
        "avro-1.12.0"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MehulBatra should we also add this instruction to the iceberg integration documentation?

https://fluss.apache.org/docs/next/streaming-lakehouse/integrate-data-lakes/iceberg/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to do it in #1727. Let's just focus on quickstart-paimon in this pr. The reason is that:

  • What's jars to be add for iceberg quickstart should depend on what the iceberg quickstart looks like. For example, if no s3, we don't need to add iceberg-aws-bundle
  • There should be some class conflicts that should be resolved, for example, iceberg 1.9.0 has also drop the support for hadoop2 but the quickstart-paimon use hadoop2 jar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also while running tiering service for iceberg I found out we need avro jar also else we get the following error:

Caused by: java.lang.NoSuchMethodError: 'org.apache.avro.LogicalTypes$TimestampNanos org.apache.avro.LogicalTypes.timestampNanos()'
    at org.apache.iceberg.avro.TypeToSchema.<clinit>(TypeToSchema.java:50)
    # Download Avro 1.12.0
    download_jar \
        "https://repo1.maven.org/maven2/org/apache/avro/avro/1.12.0/avro-1.12.0.jar" \
        "./lib/avro-1.12.0.jar" \
        "" \
        "avro-1.12.0"

Yes, that's what I mean class conflict. We put hadooop2 related classes in FLINK_HOME/lib, but iceberg required hadooop3. And the document also said to use hadoop3. I already created a repo https://github.com/luoyuxia/fluss-iceberg/tree/main/flink/lib to verify fluss interating with iceberg. You can refer to here for what jars added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, I append a commit to use hadoop3 and the quickstart-paimon still works. After upgrade to hadoop3, iceberg can also share this hadoop3 lib without the error:

Caused by: java.lang.NoSuchMethodError: 'org.apache.avro.LogicalTypes$TimestampNanos org.apache.avro.LogicalTypes.timestampNanos()'
    at org.apache.iceberg.avro.TypeToSchema.<clinit>(TypeToSchema.java:50)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuchong I have added the jars required specifically for the iceberg quickstart for a self hosted flink environment part!

Copy link
Contributor

@MehulBatra MehulBatra Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luoyuxia I was unaware for Adding hadoop 3 should resolve avro error without needing seperate avro jar, will incorporate that in quickstart!

@luoyuxia
Copy link
Contributor Author

@wuchong Comments addressed

@MehulBatra
Copy link
Contributor

@luoyuxia Looks good to me, Thanks for all the hard work and setting up the foundation!

@MehulBatra MehulBatra merged commit be6fc3b into apache:main Oct 13, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Dockerfile for building flink-quickstart docker image

4 participants