Name	Name	Last commit message	Last commit date
parent directory ..
notebooks	notebooks
README.md	README.md
docker-compose.yml	docker-compose.yml
launch-docker.sh	launch-docker.sh

Name

Last commit message

Last commit date

Getting Started with Apache Spark and Apache Polaris

This getting started guide provides a docker-compose file to set up Apache Spark with Apache Polaris. Apache Polaris is configured as an Iceberg REST Catalog in Spark. A Jupyter notebook is used to run PySpark.

Build the Polaris image

If a Polaris image is not already present locally, build one with the following command:

./gradlew \
  :polaris-server:assemble \
  :polaris-server:quarkusAppPartsBuild --rerun \
  -Dquarkus.container-image.build=true

Run the `docker-compose` file

To start the docker-compose file with the necessary dependencies, run this command from the repo's root directory:

sh getting-started/spark/launch-docker.sh

This will spin up 2 container services

The polaris service for running Apache Polaris using an in-memory metastore
The jupyter service for running Jupyter notebook with PySpark

Access the Jupyter notebook interface

In the Jupyter notebook container log, look for the URL to access the Jupyter notebook. The url should be in the format, http://127.0.0.1:8888/lab?token=<token>.

Open the Jupyter notebook in a browser. Navigate to notebooks/SparkPolaris.ipynb

Run the Jupyter notebook

You can now run all cells in the notebook or write your own code!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Getting Started with Apache Spark and Apache Polaris

Build the Polaris image

Run the `docker-compose` file

Access the Jupyter notebook interface

Run the Jupyter notebook

FilesExpand file tree

spark

Directory actions

More options

Directory actions

More options

Latest commit

History

spark

Folders and files

parent directory

README.md

Getting Started with Apache Spark and Apache Polaris

Build the Polaris image

Run the docker-compose file

Access the Jupyter notebook interface

Run the Jupyter notebook

Run the `docker-compose` file