end-to-end-data-pipeline-finetuned-bert-MARBERT-for-NLP

The central concept is to establish a daily automated pipeline that collects data, processes it through NLP models, stores the results, and updates a dynamic visualization platform.

The project sets forth a comprehensive automation-driven agenda, aiming to seamlessly analyze, process, and visualize COVID-19 discourse while maintaining an ongoing and updated pipeline. The primary objectives include:

1. Automated Data Collection and Preprocessing:

Develop an automated data collection mechanism to extract COVID-19-related content from multiple sources, including Facebook, YouTube, Twitter, and influential news platforms such as Hespress. The collected data will be automatically preprocessed to ensure consistency and uniformity for subsequent analysis.

2. Automated Natural Language Processing (NLP) Tasks:

Design and implement automated NLP processes encompassing topic modeling and sentiment analysis. These tasks will be executed using automated scripts, ensuring the extraction of relevant insights from the collected textual data.

3. Dynamic Data Visualization Platform:

Create an automated data visualization platform that presents the processed insights in an interactive and user-friendly manner. The platform will dynamically adapt to the automated updates from the data analysis pipeline, enabling stakeholders to access real-time insights without manual intervention.

4. Establishing a Daily Pipeline:

Implement a daily automated pipeline that initiates the data collection process, followed by preprocessing, NLP analysis, and subsequent storage in a database. This automated sequence will ensure that the collected data is consistently updated and refined, reflecting the latest COVID-19 discourse.

5. Continuous Model Prediction and Storage:

Automate the process of running NLP models on the preprocessed data to derive topic categorizations and sentiment scores. These outcomes will be automatically stored in a database alongside the original data, forming a robust and comprehensive repository.

6. Real-time Data Updates for Visualization:

With the daily pipeline in place, the automated updates will populate the visualization platform, offering stakeholders access to the latest insights and trends within the COVID-19 discourse. Through this architecture, the project establishes an automated end-to-end pipeline that collects, processes, and analyzes COVID-19 discourse while ensuring real-time updates for informed decision-making.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
automation		automation
model		model
scraping data and scripts		scraping data and scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

end-to-end-data-pipeline-finetuned-bert-MARBERT-for-NLP

About

Uh oh!

Releases

Packages

Languages

Chaimaaelachchachi/end-to-end-data-pipeline-finetuned-bert-MARBERT-for-NLP

Folders and files

Latest commit

History

Repository files navigation

end-to-end-data-pipeline-finetuned-bert-MARBERT-for-NLP

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages