Skip to content

Implement ETL pipeline for OpenAlex data standardization#7

Open
hadimobini00-ship-it wants to merge 3 commits into
PRAISELab-PicusLab:mainfrom
hadimobini00-ship-it:main
Open

Implement ETL pipeline for OpenAlex data standardization#7
hadimobini00-ship-it wants to merge 3 commits into
PRAISELab-PicusLab:mainfrom
hadimobini00-ship-it:main

Conversation

@hadimobini00-ship-it
Copy link
Copy Markdown

@hadimobini00-ship-it hadimobini00-ship-it commented May 30, 2026

Project Summary

This Pull Request introduces a modular ETL (Extract, Transform, Load) pipeline designed to retrieve academic data from the OpenAlex API and transform it into the standardized Web of Science format.

Key Components

api_retriever.py: Handles the extraction of raw data from the OpenAlex API.
mapping.py: Contains the core logic for schema conversion and field mapping.
dispatcher.py: Manages the flow and routing of data throughout the pipeline.
validator.py: Ensures data integrity and format compliance before final output.
main.py: The entry point for executing the complete data pipeline.

How to Run the Code

  1. Ensure all dependencies are installed in your Python environment.
  2. Navigate to the project directory:
    cd Hardware & Software_2nd_semester
  3. Execute the pipeline using:
    python main.py

Expected Results

The pipeline processes queries ("machine learning"), validates the structure against Web of Science requirements, and outputs the formatted research data.

Additional Notes

This implementation ensures modularity, allowing for easier maintenance and testing of individual pipeline stages.
Please review the mapping logic in mapping.py to ensure it meets the latest requirements for the target schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant