Skip to content

Oumayma-hy/ML_Project_with_Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Heart Disease Early Detection ML Pipeline

Overview

This project implements a machine learning pipeline for the early detection of heart diseases using Apache Spark, MLlib library, and a variety of machine learning algorithms including RandomForest, Logistic Regression, and XGBoost. The pipeline is designed to process a CSV database, apply transformations, train models, evaluate their performance, and showcase results through data visualization.

Project Highlights

  • Machine Learning Algorithms:

    • RandomForest
    • Logistic Regression
    • XGBoost
  • Processing Chain:

    • Transformers: Utilized 2 transformers in the processing chain.
    • Estimator: Trained models using various algorithms.
    • Pipeline: Structured a pipeline for seamless data processing.
    • Evaluation: Determined key metrics to assess model performance.
  • Activity:

    • Hunted down a suitable CSV database for the project. https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset
    • Illustrated the processing chain with a focus on transformers, estimators, pipelines, and evaluation.
    • Programmed a notebook, rigorously tested, and ensured functionality.
    • Showcased results through effective data visualization.

Results

  • Witness the power of data-driven insights!
  • The pipeline demonstrated promising outcomes in the early detection of heart diseases.
  • Achieved accuracy and efficiency through strategic algorithmic choices.

Usage

Prerequisites

  • Python (version 3.11.5)
  • Apache Spark

image image image image image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors