Skip to content

Commit 9ecec13

Browse files
committed
Refresh of Jupyter environment
1 parent 573d983 commit 9ecec13

File tree

27 files changed

+329
-1107
lines changed

27 files changed

+329
-1107
lines changed

.gitignore

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,10 @@
22
.env
33
.pyc
44
venv
5+
.venv
56
__pycache__
67
build/
78

8-
# Anaconda
9-
envs
10-
.conda
11-
.cache
12-
139
# OS X Artifacts
1410
.DS_Store
1511

CLAUDE.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Overview
6+
7+
This repository contains Optimizely Labs - a collection of self-contained tutorials demonstrating how to work with Optimizely data and developer tools. Each lab is published to optimizely.com/labs via a Contentful CMS integration.
8+
9+
## Repository Architecture
10+
11+
### Content Structure
12+
- **`/labs/`**: Each subdirectory is a self-contained lab with its own README.md (tutorial content), metadata.md (Contentful metadata), and any supporting code/resources
13+
- **`/utils/`**: Python scripts for publishing and managing labs in Contentful CMS
14+
- **`/templates/`**: Templates for creating new labs (new-content and link-to-existing-content)
15+
- **`.github/workflows/`**: CI/CD pipelines for automatic publishing to Contentful
16+
17+
### Lab Publication System
18+
The repository uses a Python-based publishing system that:
19+
1. Scans `/labs/` for all lab directories (slugs)
20+
2. Zips each lab's contents and uploads to S3
21+
3. Parses `metadata.md` (YAML frontmatter) and README.md/index.md
22+
4. Upserts lab entries to Contentful CMS
23+
5. Labs appear at `optimizely.com/labs/{slug}`
24+
25+
**Key files:**
26+
- `utils/publish.py`: Main publishing script - zips labs, uploads to S3, syncs to Contentful
27+
- `utils/delete.py`: Deletes a specific lab from Contentful
28+
- `utils/labs_constants.py`: Configuration constants (paths, Contentful settings, S3 credentials)
29+
30+
### Lab Slug Requirements
31+
- Slugs must match `^[a-zA-Z0-9-.]{1,64}$` (alphanumeric, dashes, periods, underscores only)
32+
- Slugs serve as both directory names and Contentful entry IDs
33+
- Slugs become URL paths: `optimizely.com/labs/{slug}`
34+
35+
## Common Development Tasks
36+
37+
### Installing Dependencies
38+
```bash
39+
pip install -r requirements.txt
40+
```
41+
42+
### Publishing Labs to Contentful
43+
Publishing happens automatically on push to `master` branch via GitHub Actions, but can be run manually:
44+
45+
```bash
46+
# Set required environment variables (see labs_constants.py for full list)
47+
export LABS_CONTENTFUL_ENVIRONMENT=master
48+
export LABS_CONTENTFUL_SPACE_ID=zw48pl1isxmc
49+
export LABS_CONTENTFUL_MANAGEMENT_API_TOKEN=<token>
50+
export LIBRARY_URL=https://library.optimizely.com/
51+
export LIBRARY_S3_BUCKET=library-optimizely-com
52+
export LIBRARY_S3_ACCESS_KEY=<key>
53+
export LIBRARY_S3_SECRET_KEY=<secret>
54+
55+
# Run publish script
56+
python utils/publish.py
57+
```
58+
59+
### Deleting a Lab
60+
```bash
61+
export SLUG_TO_DELETE=<lab-slug>
62+
export LABS_CONTENTFUL_ENVIRONMENT=master
63+
export LABS_CONTENTFUL_SPACE_ID=zw48pl1isxmc
64+
export LABS_CONTENTFUL_MANAGEMENT_API_TOKEN=<token>
65+
66+
python utils/delete.py
67+
```
68+
69+
### Creating a New Lab
70+
71+
1. Create directory under `/labs/` with a valid slug name
72+
2. Add `README.md` using template from `/templates/new-content/README.md` or `/templates/link-to-existing-content/README.md`
73+
3. Add `metadata.md` using template from `/templates/metadata.md`
74+
4. Set `excludeFromListing: true` in metadata.md initially
75+
5. Use absolute URLs for links/images (prefix with `https://raw.githubusercontent.com/optimizely/labs/master/` for images)
76+
6. Submit PR for review
77+
7. After merge, lab appears at `optimizely.com/labs/{slug}`
78+
8. When ready to feature, set `excludeFromListing: false` via separate PR
79+
80+
### Lab Content Structure
81+
82+
**README.md/index.md** (content priority: index.md > README.md):
83+
- Tutorial content in Markdown
84+
- Should include: summary, pre-requisites, numbered steps
85+
- Use absolute URLs for all links and images
86+
87+
**metadata.md**:
88+
- YAML frontmatter with: title, summary, revisionDate, labels, author, seo, excludeFromListing
89+
- Controls how lab appears in Contentful and on optimizely.com/labs
90+
91+
## Lab Types
92+
93+
Labs cover various Optimizely integrations and use cases:
94+
- **Feature Flags**: Python Flask, Ruby Sinatra examples
95+
- **Data Analysis**: Spark/PySpark notebooks for Enriched Event data
96+
- **Integrations**: Third-party integrations (ContentSquare, Crazy Egg, Zuko, Segment)
97+
- **Browser Extensions**: Custom Optimizely editor extensions
98+
- **Event Targeting**: Sequential events, custom targeting examples
99+
100+
### Data Analysis Labs
101+
Some labs (computing-experiment-metrics, query-enriched-event-data-with-spark) use:
102+
- Docker for containerized execution
103+
- PySpark for processing Optimizely Enriched Event data
104+
- Jupyter Lab for interactive notebooks
105+
- Python requirements.txt for lab-specific dependencies
106+
107+
Run these labs using:
108+
```bash
109+
bash bin/run.sh # Uses Docker (primary method)
110+
bash bin/run-docker.sh # Direct Docker execution
111+
```
112+
113+
## Important Notes
114+
115+
- Labs are self-contained - all resources should be within the lab directory
116+
- Publishing is automatic on master branch commits
117+
- Lab zips are uploaded to S3 and linked in Contentful as downloadable resources
118+
- Metadata field `excludeFromListing: true` hides labs from main listing page
119+
- Use GitHub Actions workflow "Contentful Delete" to manually delete labs
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Use official Python image
2+
FROM python:3.11-slim
3+
4+
# Install Java (required for PySpark)
5+
RUN apt-get update && \
6+
apt-get install -y --no-install-recommends \
7+
openjdk-21-jre-headless \
8+
wget \
9+
curl \
10+
&& apt-get clean && \
11+
rm -rf /var/lib/apt/lists/*
12+
13+
# Set Java environment variables
14+
ENV JAVA_HOME=/usr/lib/jvm/java-21-openjdk-arm64
15+
16+
# Install PySpark and Jupyter via pip
17+
RUN pip install --no-cache-dir \
18+
pyspark==3.5.0 \
19+
jupyterlab==4.0.9 \
20+
notebook \
21+
ipykernel \
22+
ipywidgets
23+
24+
# Create a non-root user
25+
RUN useradd -m -s /bin/bash jovyan
26+
USER jovyan
27+
WORKDIR /home/jovyan
28+
29+
# Expose Jupyter port
30+
EXPOSE 8888
31+
32+
# Default command
33+
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--no-browser"]

labs/computing-experiment-metrics/README.md

Lines changed: 13 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -4,78 +4,36 @@ In this Lab, we'll walk through an end-to-end workflow for computing a series of
44

55
## Running this notebook
66

7-
There are several ways to run this notebook locally:
8-
- Using the `run.sh` script
9-
- Using [Docker](https://www.docker.com/) with the `run-docker.sh` script
10-
- Manually, using the `conda` CLI
7+
This lab runs using [Docker](https://www.docker.com/), which provides all necessary dependencies (Python, Java, Spark, Jupyter). The lab includes a `Dockerfile` that builds a custom image from the official `python:3.11-slim` base image, installing all dependencies via standard package managers (apt and pip).
118

12-
### Running the notebook with `run.sh`
13-
14-
You can use the `run.sh` script to build your environment and run this notebook with a single command.
15-
16-
#### Prerequisite: conda (version 4.4+)
17-
18-
[Anaconda]: https://www.anaconda.com/distribution/
19-
[Miniconda]: https://docs.conda.io/en/latest/miniconda.html
9+
### Prerequisites
2010

21-
You can install the `conda` CLI by installing [Anaconda] or [Miniconda].
11+
- [Docker](https://www.docker.com/) must be installed on your system
2212

23-
#### Running Jupyter Lab
13+
### Running the notebook with `run.sh`
2414

25-
This lab directory contains a handy script for building your conda environment and running Jupyter Lab. To run it, simply use
15+
You can use the `run.sh` script to run this notebook with a single command:
2616

2717
```sh
2818
bash bin/run.sh
2919
```
3020

31-
That's it, you're done!
21+
This script will:
22+
1. Build a custom Docker image from the local Dockerfile
23+
2. Install lab-specific Python dependencies
24+
3. Launch Jupyter Lab
3225

33-
### Running this notebook with Docker
26+
That's it, you're done!
3427

35-
If you have [Docker](https://www.docker.com/) installed, you can run PySpark and Jupyter Lab without installing any other dependencies.
28+
### Running the notebook directly with Docker
3629

37-
Execute `run-docker.sh` in the `./bin` directory to open Jupyter Lab in a Docker container:
30+
Alternatively, you can execute `run-docker.sh` directly:
3831

3932
```sh
4033
bash bin/run-docker.sh
4134
```
4235

43-
**Note:** Docker makes it easy to get started with PySpark, but it adds overhead and may require [additional configuration](https://docs.docker.com/config/containers/resource_constraints/) to handle large workloads.
44-
45-
### Running this notebook manually
46-
47-
If you prefer to build and activate your conda environment manually, you can use the `conda` CLI and the environment specification files in the `./lab_env` directory to do so.
48-
49-
#### Prerequisite: conda (version 4.4+)
50-
51-
[Anaconda]: https://www.anaconda.com/distribution/
52-
[Miniconda]: https://docs.conda.io/en/latest/miniconda.html
53-
54-
You can install the `conda` CLI by installing [Anaconda] or [Miniconda].
55-
56-
#### Building and activating your Aanconda environment
57-
58-
Start by building (or updating) and activating your anaconda environment. This step will install [OpenJDK](https://openjdk.java.net/), [PySpark](https://spark.apache.org/docs/latest/api/python/pyspark.html), [Jupyter Lab](https://jupyter.org/), and other necessary dependencies.
59-
60-
```sh
61-
conda env update --file lab_env/base.yml --name optimizelylabs
62-
conda env update --file lab_env/labs.yml --name optimizelylabs
63-
conda activate optimizelylabs
64-
```
65-
66-
Next, install a jupyter kernel for this environment:
67-
68-
```sh
69-
python -m ipykernel install --user \
70-
--name optimizelylabs \
71-
--display-name="Python 3 (Optimizely Labs Environment)"
72-
```
73-
74-
Finally, start Jupyter Lab in your working directory:
75-
76-
```sh
77-
jupyter lab .
78-
```
36+
**Note:** Docker makes it easy to get started with PySpark, but it may require [additional configuration](https://docs.docker.com/config/containers/resource_constraints/) to handle large workloads.
7937

8038
## Specifying a custom data directory
8139

labs/computing-experiment-metrics/bin/build.sh

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,20 @@ if [[ ! -f "$NB" ]]; then
2727
exit 1
2828
fi
2929

30-
# Create conda build environment
31-
export INSTALL_BUILD_DEPENDENCIES=true
32-
. "$LAB_BIN_DIR/env.sh"
30+
# Create build directory
31+
mkdir -p "$LAB_BUILD_DIR"
32+
33+
# Create Python virtual environment and install build dependencies
34+
echo "Creating Python virtual environment for build"
35+
VENV_DIR="$LAB_BUILD_DIR/.venv"
36+
python3 -m venv "$VENV_DIR"
37+
source "$VENV_DIR/bin/activate"
38+
39+
echo "Installing build dependencies"
40+
pip install -r "$LAB_BASE_DIR/requirements-build.txt"
3341

3442
# Backup passed notebook
3543
echo "Backing up $NB to $LAB_BUILD_DIR/backup.ipynb"
36-
mkdir -p "$LAB_BUILD_DIR"
3744
cp "$NB" "$LAB_BUILD_DIR/backup.ipynb"
3845

3946
# 1. Remove outputs and cell metadata from the passed notebook

labs/computing-experiment-metrics/bin/env.sh

Lines changed: 0 additions & 45 deletions
This file was deleted.

labs/computing-experiment-metrics/bin/run-docker.sh

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,32 @@
33
# run-docker.sh
44
#
55
# A handy script for running the lab notebook locally in a docker container
6+
# Uses a custom Docker image built from python:3.11-slim
67

78
set -e
89

910
# Use the script path to build an absolute path for the Lab's base directory
1011
SCRIPT_DIR=$(dirname "$0")
1112
. "$SCRIPT_DIR/base.sh"
1213

14+
# Build the Docker image from local Dockerfile
15+
IMAGE_NAME="optimizely-pyspark-lab"
16+
echo "Building Docker image..."
17+
docker build -t "$IMAGE_NAME" "$LAB_BASE_DIR"
18+
1319
# The Lab directory should be mounted in ~/lab in the container
1420
CONTAINER_HOME=/home/jovyan
1521
CONTAINER_LAB_BASE_DIR="$CONTAINER_HOME/lab"
16-
CONTAINER_LAB_BIN_DIR="$CONTAINER_LAB_BASE_DIR/bin"
22+
23+
# Build the startup command to install dependencies and launch Jupyter
24+
STARTUP_CMD="pip install -r $CONTAINER_LAB_BASE_DIR/requirements.txt && \
25+
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --NotebookApp.token='' --NotebookApp.password='' --notebook-dir=$CONTAINER_LAB_BASE_DIR"
26+
27+
# Detect if running in interactive terminal
28+
DOCKER_FLAGS=""
29+
if [ -t 0 ]; then
30+
DOCKER_FLAGS="-it"
31+
fi
1732

1833
# If OPTIMIZELY_DATA_DIR is defined, mount the specified data directory in
1934
# the container and set the container OPTIMIZELY_DATA_DIR envar accordingly
@@ -22,21 +37,17 @@ if [[ -n "${OPTIMIZELY_DATA_DIR:-}" ]]; then
2237
CONTAINER_DATA_DIR="$CONTAINER_HOME/optimizely_data"
2338
echo "OPTIMIZELY_DATA_DIR envar set. Mapping to $CONTAINER_DATA_DIR"
2439

25-
docker run -it --rm \
40+
docker run $DOCKER_FLAGS --rm \
2641
-p 8888:8888 \
2742
-v "$LAB_BASE_DIR:$CONTAINER_LAB_BASE_DIR" \
2843
-v "$OPTIMIZELY_DATA_DIR:$CONTAINER_DATA_DIR" \
29-
-e "IN_DOCKER_CONTAINER=true" \
3044
-e "OPTIMIZELY_DATA_DIR=$CONTAINER_DATA_DIR" \
31-
jupyter/pyspark-notebook \
32-
bash "$CONTAINER_LAB_BIN_DIR/run.sh"
45+
"$IMAGE_NAME" \
46+
bash -c "$STARTUP_CMD"
3347
else
34-
docker run -it --rm \
48+
docker run $DOCKER_FLAGS --rm \
3549
-p 8888:8888 \
3650
-v "$LAB_BASE_DIR:$CONTAINER_LAB_BASE_DIR" \
37-
-e "IN_DOCKER_CONTAINER=true" \
38-
jupyter/pyspark-notebook \
39-
bash "$CONTAINER_LAB_BIN_DIR/run.sh"
51+
"$IMAGE_NAME" \
52+
bash -c "$STARTUP_CMD"
4053
fi
41-
42-

0 commit comments

Comments
 (0)