Skip to content

Collection of Topics to improve after v3 release #129

@FBumann

Description

@FBumann

Document non-determinism of some clustering methods when no random seed is set

Method Implementation Deterministic Notes
averaging Sequential division ✅ Yes Divides periods sequentially by order
hierarchical sklearn.AgglomerativeClustering ✅ Yes Ward linkage, no randomness
contiguous / adjacent_periods sklearn.AgglomerativeClustering + connectivity ✅ Yes Same as hierarchical with adjacency constraint
kmedoids / k_medoids Pyomo MILP (exact optimization) ✅ Yes Solves optimization problem exactly
kmeans / k_means sklearn.KMeans ❌ No Uses k-means++ initialization with random_state=None
kmaxoids / k_maxoids Custom implementation ❌ No Uses numpy.random.permutation()
Click to expand

Deterministic Methods

  • averaging: Simply divides periods into equal sequential groups. No randomness involved.

  • hierarchical: Uses sklearn's AgglomerativeClustering with Ward linkage. This is a deterministic algorithm that produces the same dendrogram given the same input.

  • contiguous / adjacent_periods: Same as hierarchical but with an adjacency constraint matrix. Still deterministic.

  • kmedoids / k_medoids: Uses Mixed Integer Linear Programming (MILP) via Pyomo with the HiGHS solver. This is an exact optimization that always finds the same optimal solution.

Non-Deterministic Methods (Require Seeding)

  • kmeans / k_means: Uses sklearn's KMeans with k-means++ initialization. The initialization is random and controlled by sklearn's internal random state. To ensure reproducibility:

    • Set np.random.seed() before calling, OR
    • Pass random_state parameter to KMeans (requires code modification)
  • kmaxoids / k_maxoids: Custom implementation in tsam/utils/k_maxoids.py that uses numpy.random.permutation() (line 98) for random restarts. To ensure reproducibility:

    • Set np.random.seed() before calling

Make some methods private in `tuning.py``

Rename notebooks

#138

Maybe enable persistance of extreme mode replace in apply()

#136

Migrate publishing to a more automated process

Remove matplotlib from dependencies (only used in 2 notebooks atm, not in code. Migrate to plotly)

#137

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions