Skip to content

The central concept is to establish a daily automated pipeline that collects data, processes it through NLP models, stores the results, and updates a dynamic visualization platform.

Notifications You must be signed in to change notification settings

Chaimaaelachchachi/end-to-end-data-pipeline-finetuned-bert-MARBERT-for-NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

end-to-end-data-pipeline-finetuned-bert-MARBERT-for-NLP

The central concept is to establish a daily automated pipeline that collects data, processes it through NLP models, stores the results, and updates a dynamic visualization platform.

The project sets forth a comprehensive automation-driven agenda, aiming to seamlessly analyze, process, and visualize COVID-19 discourse while maintaining an ongoing and updated pipeline. The primary objectives include:

1. Automated Data Collection and Preprocessing:

Develop an automated data collection mechanism to extract COVID-19-related content from multiple sources, including Facebook, YouTube, Twitter, and influential news platforms such as Hespress. The collected data will be automatically preprocessed to ensure consistency and uniformity for subsequent analysis.

2. Automated Natural Language Processing (NLP) Tasks:

Design and implement automated NLP processes encompassing topic modeling and sentiment analysis. These tasks will be executed using automated scripts, ensuring the extraction of relevant insights from the collected textual data.

3. Dynamic Data Visualization Platform:

Create an automated data visualization platform that presents the processed insights in an interactive and user-friendly manner. The platform will dynamically adapt to the automated updates from the data analysis pipeline, enabling stakeholders to access real-time insights without manual intervention.

4. Establishing a Daily Pipeline:

Implement a daily automated pipeline that initiates the data collection process, followed by preprocessing, NLP analysis, and subsequent storage in a database. This automated sequence will ensure that the collected data is consistently updated and refined, reflecting the latest COVID-19 discourse.

5. Continuous Model Prediction and Storage:

Automate the process of running NLP models on the preprocessed data to derive topic categorizations and sentiment scores. These outcomes will be automatically stored in a database alongside the original data, forming a robust and comprehensive repository.

6. Real-time Data Updates for Visualization:

With the daily pipeline in place, the automated updates will populate the visualization platform, offering stakeholders access to the latest insights and trends within the COVID-19 discourse. Through this architecture, the project establishes an automated end-to-end pipeline that collects, processes, and analyzes COVID-19 discourse while ensuring real-time updates for informed decision-making.

About

The central concept is to establish a daily automated pipeline that collects data, processes it through NLP models, stores the results, and updates a dynamic visualization platform.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published