Demonstrating the use of Amazon Nova Multimodal Embeddings and TwelveLabs Pegasus 1.2 models on Amazon Bedrock along with Amazon OpenSearch Serverless to perform semantic search.
- Python 3.12+
- AWS credentials
- Amazon S3 bucket
- Amazon OpenSearch Serverless collection (optional)
- FFmpeg (optional for keyframe generation)
Clone the repository:
git clone https://github.com/garystafford/nova-mm-embedding-model-demo.git
cd nova-mm-embedding-model-demoRename python-dotenv file:
Mac:
mv env.txt .envWindows:
rename env.txt .envEnter the following environment variables in the .env file:
AWS_ACCESS_KEY_ID=<Your AWS Access Key ID>
AWS_SECRET_ACCESS_KEY=<Your AWS Secret Access Key>
AWS_SESSION_TOKEN=<Your AWS Session Token>
S3_VIDEO_STORAGE_BUCKET=<Your S3 Bucket Name>
OPENSEARCH_ENDPOINT=<Your OpenSearch Endpoint>
CLOUDFRONT_URL=<Your Amazon CloudFront Distribution>Create a Python virtual environment for the Jupyter Notebook:
Mac:
python -m pip install virtualenv -Uq
python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt -UqWindows:
python -m venv .venv
.venv\Scripts\activate
python -m pip install pip -Uq
python -m pip install -r requirements.txt -UqCheck for FFmpeg:
ffmpeg -versionVideos and keyframes should be uploaded to the Amazon S3 buckets in us-east-1.
Run the following Python scripts.
# Extract keyframes from videos
python ./extract_keyframe.py
# Generate embeddings using Amazon Nova Multimodal Embeddings
python ./generate_embeddings.py
# Generate video analyses using TwelveLabs Pegasus 1.2
python ./generate_analyses.py
# Prepare combined OpenSearch documents
python ./prepare_documents.pyAccess the Jupyter Notebook for all OpenSearch-related code: nova-mm-emd-opensearch-demo.ipynb
As an alternative to AWS, you can run OpenSearch locally using Docker. This is intended for development environments only and is not secure.
Mac:
docker swarm init
SWARM_ID=$(docker node ls --format "{{.ID}}")
docker stack deploy -c docker-compose.yml $SWARM_ID
docker service lsWindows:
docker swarm init
for /f "delims=" %x in ('docker node ls --format "{{.ID}}"') do set SWARM_ID=%x
docker stack deploy -c docker-compose.yml %SWARM_ID%
docker service lsYou can interact with your OpenSearch index in the Dev Tools tab of the OpenSearch Dashboards UI.
GET tv-commercials-index-nova-mm/_settings
GET tv-commercials-index-nova-mm/_count
GET tv-commercials-index-nova-mm/_search
{
"query": {
"match_all": {}
}
}
GET tv-commercials-index-nova-mm/_search
{
"query": {
"terms": {
"keywords": [
"car",
"city"
]
}
},
"_source": false,
"fields": ["title", "durationSec"]
}
GET tv-commercials-index-nova-mm/_search
{
"query": {
"nested": {
"path": "embeddings",
"query": {
"knn": {
"embeddings.embedding": {
"vector": [
0.059814453125,
-0.017333984375,
0.01153564453125,
...
],
"k": 6
}
}
}
}
},
"size": 6,
"_source": {
"excludes": [
"embeddings.embedding"
]
}
}
Television commercials used in video
Preview of search results with keyframe previews
“Elbow” method to help select the optimal number of clusters
All video segments plotted using t-SNE and K-Means clustering
The contents of this repository represent my viewpoints and not those of my past or current employers, including Amazon Web Services (AWS). All third-party libraries, modules, plugins, and SDKs are the property of their respective owners.




