-
-
Notifications
You must be signed in to change notification settings - Fork 342
Feat : k_means Algorithm #252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive implementation of the K-Means clustering algorithm in R, including k-means++ initialization, multiple quality metrics, and extensive examples.
Key changes:
- Complete K-Means clustering implementation with R6 class structure
- Support for multiple initialization methods (random, k-means++, custom)
- Four clustering quality metrics (silhouette, Davies-Bouldin, Calinski-Harabasz, inertia)
- Comprehensive examples demonstrating usage patterns and best practices
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

K-Means Clustering Algorithm – Summary Review
Overview
K-Means is a centroid-based clustering algorithm that divides n data points into k clusters using centroids (means). It minimizes within-cluster variance and is known for its simplicity and speed.
Algorithm Summary
Key Traits:
Complexity
Initialization Methods
Quality Metrics
Key Methods
fit(X)predict(X)fit_predict(X)transform(X)silhouette_score()get_centroids()Advantages ✅
Disadvantages ❌
Best Practices
X_scaled <- scale(X)tol = 1e-4, limit iterations withmax_iter = 300Use Cases
✅ Customer segmentation
✅ Image compression (color quantization)
✅ Document clustering
✅ Anomaly detection
✅ Exploratory data analysis
❌ Avoid for: non-spherical clusters, mixed data, or many outliers
Performance Tips
Comparison with Other Algorithms
When to Use
✅ Large datasets
✅ Well-separated spherical clusters
✅ Continuous numeric data
✅ Need quick, interpretable results
When to Avoid
❌ Unknown k
❌ Non-spherical or overlapping clusters
❌ Heavy noise or outliers
❌ Categorical or mixed data
Verdict ⭐⭐⭐⭐☆ (4/5)
K-Means is a fast, scalable, and easy-to-use clustering algorithm ideal for large datasets and exploratory tasks.
However, it’s sensitive to initialization, scale, and outliers.
Strengths: Speed, simplicity, scalability
Weaknesses: Sensitive to initialization/outliers
Best For: Customer segmentation, image compression, initial data exploration