harness · Sonichigo · Aug 14, 2025 · stephenatwell · Aug 19, 2025 · stephenatwell
@@ -28,6 +28,7 @@ tags:
 
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
+import CommitToGit from "../snippets/commit-to-git.md";
 
 A changelog is a collection of database changes that can be applied to a database. It serves as a version-controlled record of changes, allowing teams to manage and track modifications to the database schema and data.
 
@@ -126,40 +127,10 @@ Toggle on the "Enable container based execution".
 11. Once the pipeline is executed successfully, you will find the changelog file in the specified path.
 ![Generate Changelog](../use-database-devops/static/build-changelog/db-devops-generate-changelog.png)
 
-#### Commit to Git Repository
-You can commit the generated changelog file to your git repository using the `Run Command` step in the pipeline. This allows you to version control your changelog file and keep track of changes over time. Otherwise, once the pipeline is executed, pods will be deleted and the changelog file will be lost.
-
-1. In the Pipeline, under the `Step Group` section, add a new step `Run Command` as the step type.
-![Commit to Git Step](../use-database-devops/static/build-changelog/db-devops-changelog-git-commit-step.png)
-- **Name**: The name of the step.
-- **Registry Type**: The type of registry to use. We can use `Third Party Registry` or `Harness Artifact Registry`. 
-- **Container Registry**: The container registry to use. This is the location where the image is stored. In this case, we will use Docker Hub as the registry.
-- **Image**: The name of the image to use. In this case, we will use `alpine/git`.
-- **Shell**: The shell to use. We can use `bash` or `sh`, depending on the image used.
-- **Command**: The command to be executed. In this case, we will use following command to commit the changelog file to the git repository:
-  ```bash
-  git init
-
-  # Configure Git user
-  git config --global user.email <User Email>
-  git config --global user.name <User Name>
-  git config --global user.password <PAT Token> ## PAT saved in Harness Secrets Manager
-
-  git add generated.yml ## Our changelog file name which we generated in the previous step
-  git commit -m "generated changelog from running instance" -s
-
-  # Get current branch name
-  CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD)
-
-  # Add remote repository
-  git remote add origin <User Email>:<PAT>@<Git Repo URL>.git ## Artifact Registry URL with https:// after @
-
-  # Push to remote using the current branch name
-  git push -u origin $CURRENT_BRANCH -f
-  ```
-3. Click on `Apply Changes`. Save the Pipeline and click on the `Run` button to run the pipeline.
-![Commit to Git](../use-database-devops/static/build-changelog/db-devops-changelog-git-commit.png)
+<CommitToGit />
 
+![Commit to Git](../use-database-devops/static/build-changelog/db-devops-changelog-git-commit.png)
+This step will ensure that the generated changelog file is committed to your Git repository, allowing you to track changes and maintain version control over your database schema changes.
 </TabItem>
 </Tabs>
 

@@ -0,0 +1,169 @@
+---
+title: Generating a MongoDB Changelog from an Existing Database
+sidebar_label: MongoDB Changelog Generation
+description: Automatically generate a Liquibase-compatible changelog from an existing MongoDB database using a Python script in Harness Database DevOps pipelines, and commit it to Git for version control.
+slug: /database-devops/mongodb-changelog-generation
+sidebar_position: 15
+keywords:
+  - mongodb
+  - liquibase
+  - database devops
+  - harness db devops
+  - schema versioning
+  - changelog generation
+  - gitops
+  - ci/cd for databases
+  - db schema automation
+  - database change tracking
+  - harness pipelines
+  - pymongo
+tags:
+  - mongodb
+  - database devops
+  - harness
+  - liquibase
+  - changelog
+  - gitops
+  - ci/cd
+---
+import CommitToGit from "../../snippets/commit-to-git.md";
+
+Harness Database DevOps enables teams to integrate database schema changes into Git-driven workflows.  
+When onboarding an existing **MongoDB** database, you can use the `MongoDB.py` Python script to extract the current schema and generate a **Liquibase-compatible changelog**.  
+This changelog can then be versioned in Git and used in subsequent deployments, ensuring auditability and consistency across environments.
+
+By automating this process in a Harness pipeline, you can:
+- Avoid manual changelog creation for legacy or existing databases
+- Standardize schema tracking using Liquibase-compatible formats (JSON or YAML)
+- Keep your database changes **fully GitOps-compliant** with version control and peer review
+
+---
+
+## Prerequisites
+
+Before implementing the pipeline, ensure the following:
+
+- Pipeline execution environment can connect to your MongoDB instance  
+- The Git connector used in the pipeline has **commit** permissions  
+- MongoDB credentials have **read-only** access for schema extraction
+
+## Pipeline Implementation
+
+### Create a New Pipeline
+
+1. Go to your Harness pipeline.
+2. Click on "**Create a Pipeline**"
+3. In the Stage, select "Custom" and then create a "Step Group".
+4. Add the **GitClone** step.
+5. Then a new add step, **Run**
+![MongoDB Changelog Generation](../static/dbops-mongo-changelog.png)
+- **Container Registry**: used to pull images from private or public registries.
+- **Image**: "python:3.11-alpine"
+- **Shell**: "Python"
+- **Command**: Add the following script under the command palette:
+
+```bash
+# ====== Install Dependencies ========
+import subprocess
+import sys
+subprocess.check_call([sys.executable, "-m", "pip", "install", "pymongo", "pyyaml"])
+
+# ====== Import Libraries ===========
+import os
+from pymongo import MongoClient
+import yaml
+import json
+
+# === CONFIG ===
+MONGO_URI = "mongodb://<username>:<password>@<host>:27017"
+DATABASE_NAME = "test"
+OUTPUT_FILE = "generated.yml"
+AUTHOR = "Animesh"  # Change as needed
+CHANGESET_ID = "baseline-collections"
+
+# === SETUP ===
+client = MongoClient(MONGO_URI)
+db = client[DATABASE_NAME]
+collections = db.list_collection_names()
+
+# === BUILD YAML STRUCTURE ===
+changesets = []
+
+for name in collections:
+    if(name=="DATABASECHANGELOGLOCK" or name=="DATABASECHANGELOG"):
+        continue
+    changes = []
+    # Add createCollection
+    collection_options = db[name].options()
+    changes.append({'createCollection': {'collectionName': name,'options':json.dumps(collection_options)}})
+
+    # Add createIndex for all non-_id indexes
+    indexes = db[name].index_information()
+
+    print("processing indexes for collection: "+name+"\r\n" + json.dumps(indexes))
+    for index_name, index_data in indexes.items():
+        if index_name == "_id_":
+            continue
+        index_fields = index_data['key']
+        index_for_changelog = {}
+        unique = index_data.get('unique',False)
+
+        for currentIndex in index_fields:
+            index_for_changelog.update({currentIndex[0]:currentIndex[1]})
+        change = {
+            'createIndex': {
+                'collectionName': name,
+                #'indexName': index_name, TODO: move to options
+                'keys': json.dumps(index_for_changelog) ,
+                'options': json.dumps({"name":index_name,"unique":index_data.get('unique',unique)})
+            }
+        }
+        if index_data.get('unique', False):
+            change['createIndex']['unique'] = True
+        changes.append(change)
+    changesets.append(
+            {'changeSet': {
+                'id': CHANGESET_ID+"-"+name,
+                'author': AUTHOR,
+                'changes': changes
+            }})
+
+# Final YAML structure
+changeset = {
+    'databaseChangeLog': changesets
+}
+
+# === WRITE TO FILE ===
+with open(OUTPUT_FILE, "w") as f:
+    yaml.dump(changeset, f, sort_keys=False)
+
+print(f"✅ YAML baseline changelog with indexes written to: {OUTPUT_FILE}")
+```
+
+In the above script:
+- Update the `MONGO_URI` to your MongoDB connection string.
+- Set the `DATABASE_NAME` to the name of your database.
+- Specify the `OUTPUT_FILE` name as needed.
+- Change the `AUTHOR` and `CHANGESET_ID` variables to reflect your changes.
+
+<CommitToGit />
+
+![Commit to Git](../static/dbops-mongo-diffchangelog.png)
+This step will ensure that the generated changelog file is committed to your Git repository, allowing you to track changes and maintain version control over your database schema changes.
+
+## Best Practices
+- Store changelogs in a dedicated folder (e.g., `/db/changelog/`)
+- Validate changelog generation in a staging pipeline before committing to production branches
+- Parameterize connection details using Harness pipeline variables
+- Always use a read-only MongoDB user for schema extraction
+By integrating this process into Harness pipelines, you ensure repeatable, auditable, and version-controlled database schema onboarding—a cornerstone of GitOps-driven database delivery.
+
+## FAQs
+### 1. Can I change the changelog filename?
+Yes. Update the OUTPUT_FILE variable in the script to set a custom filename.
+### 2. Does it support JSON output instead of YAML?
+Currently, the script outputs YAML. You can modify the yaml.dump section to use json.dump if JSON output is preferred.
+### 3. How are indexes handled?
+All non-_id indexes are included in the changelog with createIndex changes. The script preserves uniqueness flags.
+### 4. How do I avoid including Liquibase internal collections?
+The script automatically excludes DATABASECHANGELOG and DATABASECHANGELOGLOCK collections.
@@ -0,0 +1,32 @@
+#### Commit to Git Repository
+You can commit the generated changelog file to your git repository using the `Run Command` step in the pipeline. This allows you to version control your changelog file and keep track of changes over time. Otherwise, once the pipeline is executed, pods will be deleted and the changelog file will be lost.
+
+1. In the Pipeline, under the `Step Group` section, add a new step `Run Command` as the step type.
+![Commit to Git Step](../use-database-devops/static/build-changelog/db-devops-changelog-git-commit-step.png)
+- **Name**: The name of the step.
+- **Registry Type**: The type of registry to use. We can use `Third Party Registry` or `Harness Artifact Registry`. 
+- **Container Registry**: The container registry to use. This is the location where the image is stored. In this case, we will use Docker Hub as the registry.
+- **Image**: The name of the image to use. In this case, we will use `alpine/git`.
+- **Shell**: The shell to use. We can use `bash` or `sh`, depending on the image used.
+- **Command**: The command to be executed. In this case, we will use following command to commit the changelog file to the git repository:
+  ```bash
+  git init
+
+  # Configure Git user
+  git config --global user.email <User Email>
+  git config --global user.name <User Name>
+  git config --global user.password <PAT Token> ## PAT saved in Harness Secrets Manager
+
+  git add generated.yml ## Our changelog file name which we generated in the previous step
+  git commit -m "generated changelog from running instance" -s
+
+  # Get current branch name
+  CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD)
+
+  # Add remote repository
+  git remote add origin <User Email>:<PAT>@<Git Repo URL>.git ## Artifact Registry URL with https:// after @
+
+  # Push to remote using the current branch name
+  git push -u origin $CURRENT_BRANCH -f
+  ```
+3. Click on `Apply Changes`. Save the Pipeline and click on the `Run` button to run the pipeline.