Instructor CXID Refactor in MongoDB #157
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ASPC Course Review System: Instructor CxID Migration
Overview
This document outlines the database migration process for transitioning the ASPC (Associated Students of Pomona College) course review system from legacy instructor IDs to API-based instructor CxIDs. The migration was necessary to align the system with the Pomona API's data structure and improve data accuracy. The system originally used legacy internal instructor IDs inherited from a previous PostgreSQL database. However, the external Pomona API uses CxIDs (arbitrary numerical identifiers) for instructors.
Database Structure
Three main MongoDB collections:
Initial Data State
CSCI005 HMand slugs likeCSCI005-HMThe Core Problem
Same professor can have multiple CxIDs if they taught at different schools throughout their career, creating a many-to-one mapping challenge. The legacy system's internal IDs didn't align with the API's data source, and we needed to maintain historical accuracy for existing course reviews while migrating to the new system.
Migration Process
Phase 1: Code Slug Refactoring
The Challenge: Course codes contained complex formatting with department codes, numbers, and sometimes letter suffixes (e.g., "BIOL052R", "CSCI005 HM"). School codes are strictly 2-letter identifiers, and the
code_slugfield needed to be consistent for proper matching with API data.Objective: Standardize the
code_slugformat across all courses in the databaseProcess:
code_slugformat (e.g.,CSCI005-HM)Result: Clean, consistent course identifiers ready for API matching
Phase 2: Historical Course Data Collection
Objective: Build complete CxID mappings from API historical data
The Breakthrough: Instead of trying to map legacy IDs directly, we realized the API's historical data contains which specific CxID was used for each course section. This solved the critical problem of determining instructor identity for past offerings.
Process:
GET api/Courses/{termKey}Implementation:
Phase 3: Update Instructors Collection
Objective: Add CxID arrays to existing instructor documents
Changes:
cxids: []array field to each instructor documentnamefield for compatibilityExample Structure:
Result: Instructor documents now contain both old and new identifier systems
Phase 4: Update Courses Collection
Objective: Replace instructor IDs with CxID-based references
Changes:
all_instructor_ids: []withall_instructor_cxids: []Key Insight: Courses like "CSCI005 HM" and "CSCI005 PO" are the same course at different schools, but should have school-specific instructor lists based on who actually taught each section.
Result:
Phase 5: Update CourseReviews Collection
Objective: Link reviews to specific instructor CxIDs
Process:
instructor_cxidfield to each reviewStatus: Implementation phase - some missing data exists where historical mappings couldn't be determined with certainty
Technical Implementation
Challenge: Complex Course Code Formatting
Issue: Course codes contained inconsistent formatting with:
Solution:
API Integration
Primary Endpoint:
GET api/Courses/{termKey}- Historical course data by termResults
Migration Outcomes
System State
Completed:
In Progress:
Before vs. After
Before:
After: