-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Document non-determinism of some clustering methods when no random seed is set
| Method | Implementation | Deterministic | Notes |
|---|---|---|---|
averaging |
Sequential division | ✅ Yes | Divides periods sequentially by order |
hierarchical |
sklearn.AgglomerativeClustering |
✅ Yes | Ward linkage, no randomness |
contiguous / adjacent_periods |
sklearn.AgglomerativeClustering + connectivity |
✅ Yes | Same as hierarchical with adjacency constraint |
kmedoids / k_medoids |
Pyomo MILP (exact optimization) | ✅ Yes | Solves optimization problem exactly |
kmeans / k_means |
sklearn.KMeans |
❌ No | Uses k-means++ initialization with random_state=None |
kmaxoids / k_maxoids |
Custom implementation | ❌ No | Uses numpy.random.permutation() |
Click to expand
Deterministic Methods
-
averaging: Simply divides periods into equal sequential groups. No randomness involved.
-
hierarchical: Uses sklearn's
AgglomerativeClusteringwith Ward linkage. This is a deterministic algorithm that produces the same dendrogram given the same input. -
contiguous / adjacent_periods: Same as hierarchical but with an adjacency constraint matrix. Still deterministic.
-
kmedoids / k_medoids: Uses Mixed Integer Linear Programming (MILP) via Pyomo with the HiGHS solver. This is an exact optimization that always finds the same optimal solution.
Non-Deterministic Methods (Require Seeding)
-
kmeans / k_means: Uses sklearn's
KMeanswithk-means++initialization. The initialization is random and controlled by sklearn's internal random state. To ensure reproducibility:- Set
np.random.seed()before calling, OR - Pass
random_stateparameter to KMeans (requires code modification)
- Set
-
kmaxoids / k_maxoids: Custom implementation in
tsam/utils/k_maxoids.pythat usesnumpy.random.permutation()(line 98) for random restarts. To ensure reproducibility:- Set
np.random.seed()before calling
- Set