Skip to content

feat: diff changelog for mongodb #11269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 4 additions & 33 deletions docs/database-devops/get-started/build-a-changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ tags:

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CommitToGit from "../snippets/commit-to-git.md";

A changelog is a collection of database changes that can be applied to a database. It serves as a version-controlled record of changes, allowing teams to manage and track modifications to the database schema and data.

Expand Down Expand Up @@ -126,40 +127,10 @@ Toggle on the "Enable container based execution".
11. Once the pipeline is executed successfully, you will find the changelog file in the specified path.
![Generate Changelog](../use-database-devops/static/build-changelog/db-devops-generate-changelog.png)

#### Commit to Git Repository
You can commit the generated changelog file to your git repository using the `Run Command` step in the pipeline. This allows you to version control your changelog file and keep track of changes over time. Otherwise, once the pipeline is executed, pods will be deleted and the changelog file will be lost.

1. In the Pipeline, under the `Step Group` section, add a new step `Run Command` as the step type.
![Commit to Git Step](../use-database-devops/static/build-changelog/db-devops-changelog-git-commit-step.png)
- **Name**: The name of the step.
- **Registry Type**: The type of registry to use. We can use `Third Party Registry` or `Harness Artifact Registry`.
- **Container Registry**: The container registry to use. This is the location where the image is stored. In this case, we will use Docker Hub as the registry.
- **Image**: The name of the image to use. In this case, we will use `alpine/git`.
- **Shell**: The shell to use. We can use `bash` or `sh`, depending on the image used.
- **Command**: The command to be executed. In this case, we will use following command to commit the changelog file to the git repository:
```bash
git init

# Configure Git user
git config --global user.email <User Email>
git config --global user.name <User Name>
git config --global user.password <PAT Token> ## PAT saved in Harness Secrets Manager

git add generated.yml ## Our changelog file name which we generated in the previous step
git commit -m "generated changelog from running instance" -s

# Get current branch name
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD)

# Add remote repository
git remote add origin <User Email>:<PAT>@<Git Repo URL>.git ## Artifact Registry URL with https:// after @

# Push to remote using the current branch name
git push -u origin $CURRENT_BRANCH -f
```
3. Click on `Apply Changes`. Save the Pipeline and click on the `Run` button to run the pipeline.
![Commit to Git](../use-database-devops/static/build-changelog/db-devops-changelog-git-commit.png)
<CommitToGit />

![Commit to Git](../use-database-devops/static/build-changelog/db-devops-changelog-git-commit.png)
This step will ensure that the generated changelog file is committed to your Git repository, allowing you to track changes and maintain version control over your database schema changes.
</TabItem>
</Tabs>

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a weird place to put this file in our navigational hierarchy. Why are we putting it here instead of under 'build-a-changelog? why isn't 'build-a-changelog' linking to it?

title: Generating a MongoDB Changelog from an Existing Database
sidebar_label: MongoDB Changelog Generation
description: Automatically generate a Liquibase-compatible changelog from an existing MongoDB database using a Python script in Harness Database DevOps pipelines, and commit it to Git for version control.
slug: /database-devops/mongodb-changelog-generation
sidebar_position: 15
keywords:
- mongodb
- liquibase
- database devops
- harness db devops
- schema versioning
- changelog generation
- gitops
- ci/cd for databases
- db schema automation
- database change tracking
- harness pipelines
- pymongo
tags:
- mongodb
- database devops
- harness
- liquibase
- changelog
- gitops
- ci/cd
---
import CommitToGit from "../../snippets/commit-to-git.md";

Harness Database DevOps enables teams to integrate database schema changes into Git-driven workflows.
When onboarding an existing **MongoDB** database, you can use the `MongoDB.py` Python script to extract the current schema and generate a **Liquibase-compatible changelog**.
This changelog can then be versioned in Git and used in subsequent deployments, ensuring auditability and consistency across environments.

By automating this process in a Harness pipeline, you can:
- Avoid manual changelog creation for legacy or existing databases
- Standardize schema tracking using Liquibase-compatible formats (JSON or YAML)
- Keep your database changes **fully GitOps-compliant** with version control and peer review
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this article barely talks about gitops, so I would personally remove this bullet and pull this out of the gitopos section..... there isn't anything mongodb specific about how we do gitops....


---

## Prerequisites

Before implementing the pipeline, ensure the following:

- Pipeline execution environment can connect to your MongoDB instance
- The Git connector used in the pipeline has **commit** permissions
- MongoDB credentials have **read-only** access for schema extraction

## Pipeline Implementation

### Create a New Pipeline

1. Go to your Harness pipeline.
2. Click on "**Create a Pipeline**"
3. In the Stage, select "Custom" and then create a "Step Group".
4. Add the **GitClone** step.
5. Then a new add step, **Run**
![MongoDB Changelog Generation](../static/dbops-mongo-changelog.png)
- **Container Registry**: used to pull images from private or public registries.
- **Image**: "python:3.11-alpine"
- **Shell**: "Python"
- **Command**: Add the following script under the command palette:

```bash
# ====== Install Dependencies ========
import subprocess
import sys
subprocess.check_call([sys.executable, "-m", "pip", "install", "pymongo", "pyyaml"])

# ====== Import Libraries ===========
import os
from pymongo import MongoClient
import yaml
import json

# === CONFIG ===
MONGO_URI = "mongodb://<username>:<password>@<host>:27017"
DATABASE_NAME = "test"
OUTPUT_FILE = "generated.yml"
AUTHOR = "Animesh" # Change as needed
CHANGESET_ID = "baseline-collections"

# === SETUP ===
client = MongoClient(MONGO_URI)
db = client[DATABASE_NAME]
collections = db.list_collection_names()

# === BUILD YAML STRUCTURE ===
changesets = []

for name in collections:
if(name=="DATABASECHANGELOGLOCK" or name=="DATABASECHANGELOG"):
continue
changes = []
# Add createCollection
collection_options = db[name].options()
changes.append({'createCollection': {'collectionName': name,'options':json.dumps(collection_options)}})

# Add createIndex for all non-_id indexes
indexes = db[name].index_information()

print("processing indexes for collection: "+name+"\r\n" + json.dumps(indexes))
for index_name, index_data in indexes.items():
if index_name == "_id_":
continue
index_fields = index_data['key']
index_for_changelog = {}
unique = index_data.get('unique',False)

for currentIndex in index_fields:
index_for_changelog.update({currentIndex[0]:currentIndex[1]})
change = {
'createIndex': {
'collectionName': name,
#'indexName': index_name, TODO: move to options
'keys': json.dumps(index_for_changelog) ,
'options': json.dumps({"name":index_name,"unique":index_data.get('unique',unique)})
}
}
if index_data.get('unique', False):
change['createIndex']['unique'] = True
changes.append(change)
changesets.append(
{'changeSet': {
'id': CHANGESET_ID+"-"+name,
'author': AUTHOR,
'changes': changes
}})

# Final YAML structure
changeset = {
'databaseChangeLog': changesets
}

# === WRITE TO FILE ===
with open(OUTPUT_FILE, "w") as f:
yaml.dump(changeset, f, sort_keys=False)

print(f"✅ YAML baseline changelog with indexes written to: {OUTPUT_FILE}")
```

In the above script:
- Update the `MONGO_URI` to your MongoDB connection string.
- Set the `DATABASE_NAME` to the name of your database.
- Specify the `OUTPUT_FILE` name as needed.
- Change the `AUTHOR` and `CHANGESET_ID` variables to reflect your changes.

<CommitToGit />

![Commit to Git](../static/dbops-mongo-diffchangelog.png)
This step will ensure that the generated changelog file is committed to your Git repository, allowing you to track changes and maintain version control over your database schema changes.

## Best Practices
- Store changelogs in a dedicated folder (e.g., `/db/changelog/`)
- Validate changelog generation in a staging pipeline before committing to production branches
- Parameterize connection details using Harness pipeline variables
- Always use a read-only MongoDB user for schema extraction
By integrating this process into Harness pipelines, you ensure repeatable, auditable, and version-controlled database schema onboarding—a cornerstone of GitOps-driven database delivery.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this article barely talks about gitops, so I would personally remove this bullet and pull this out of the gitopos section..... there isn't anything mongodb specific about how we do gitops....


## FAQs
### 1. Can I change the changelog filename?
Yes. Update the OUTPUT_FILE variable in the script to set a custom filename.
### 2. Does it support JSON output instead of YAML?
Currently, the script outputs YAML. You can modify the yaml.dump section to use json.dump if JSON output is preferred.
### 3. How are indexes handled?
All non-_id indexes are included in the changelog with createIndex changes. The script preserves uniqueness flags.
### 4. How do I avoid including Liquibase internal collections?
The script automatically excludes DATABASECHANGELOG and DATABASECHANGELOGLOCK collections.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
32 changes: 32 additions & 0 deletions docs/database-devops/snippets/commit-to-git.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#### Commit to Git Repository
You can commit the generated changelog file to your git repository using the `Run Command` step in the pipeline. This allows you to version control your changelog file and keep track of changes over time. Otherwise, once the pipeline is executed, pods will be deleted and the changelog file will be lost.

1. In the Pipeline, under the `Step Group` section, add a new step `Run Command` as the step type.
![Commit to Git Step](../use-database-devops/static/build-changelog/db-devops-changelog-git-commit-step.png)
- **Name**: The name of the step.
- **Registry Type**: The type of registry to use. We can use `Third Party Registry` or `Harness Artifact Registry`.
- **Container Registry**: The container registry to use. This is the location where the image is stored. In this case, we will use Docker Hub as the registry.
- **Image**: The name of the image to use. In this case, we will use `alpine/git`.
- **Shell**: The shell to use. We can use `bash` or `sh`, depending on the image used.
- **Command**: The command to be executed. In this case, we will use following command to commit the changelog file to the git repository:
```bash
git init

# Configure Git user
git config --global user.email <User Email>
git config --global user.name <User Name>
git config --global user.password <PAT Token> ## PAT saved in Harness Secrets Manager

git add generated.yml ## Our changelog file name which we generated in the previous step
git commit -m "generated changelog from running instance" -s

# Get current branch name
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD)

# Add remote repository
git remote add origin <User Email>:<PAT>@<Git Repo URL>.git ## Artifact Registry URL with https:// after @

# Push to remote using the current branch name
git push -u origin $CURRENT_BRANCH -f
```
3. Click on `Apply Changes`. Save the Pipeline and click on the `Run` button to run the pipeline.