Welcome to DataSprint 2025, a beginner-friendly machine learning competition designed exclusively for first-year students at MIT, Manipal hosted by IECSE Manipal.
This 3-hour online Kaggle competition introduces you to the world of data preprocessing, environmental analytics, and satellite data science.
Competition Objective & Details
Participants should build a ML-model: Binary classification model that predicts whether a given region/location is:
-> Water — A water body (lake, river, reservoir)
OR
-> Non-water — Land (vegetation, urban area, barren land)
The dataset contains NDVI (Normalized Difference Vegetation Index) time-series readings and environmental factors such as temperature, region, sensor type, and cloud coverage.
The objective is to analyze and preprocess the NDVI data by separating features and target variables to accurately predict land type. Data cleaning, feature engineering, and model interpretability are key to achieving top results.
Submission File
For each row in the test set, you must predict the land classification for the given satellite observation.
Your submission should be a CSV file with two columns:
ID – A unique identifier for each record (this matches the ID column in the test dataset).
Class – The predicted category for that location, which must be one of the following:
Water – Represents water bodies such as lakes, rivers, or reservoirs
Non-water – Represents land areas such as vegetation, urban regions, or barren land