Fix manual labels for KMeans representative digits (was causing ~7% accuracy) semi-supervised learning example #202

ShindeShivam · 2025-07-09T04:36:49Z

While going through the semi-supervised learning example in the unsupervised learning section, I initially got a slightly different accuracy when I manually labeled the representative digits based on what I actually saw in the images.

I initially thought it was an issue in my code, but to verify, I ran the original notebook as-is on Google Colab — and to my surprise, the model's accuracy was just 7%.

After digging into the code, I found the problem:

The hardcoded labels for the 50 representative digits (y_representative_digits) no longer match the current cluster centroids generated by KMeans. This is likely due to internal changes in the dataset order or scikit-learn's clustering behavior (like randomness in centroid initialization or data shuffling).

Because of this mismatch, the model was being trained on incorrect image-label pairs, leading to terrible accuracy.

Fix:

Replaced outdated y_representative_digits with correct labels (manually reassigned by inspecting the actual centroids).

Note:

I also have an earlier PR open
#196
Kindly review that one as well.

…e 9)

…learning example

ageron · 2025-08-10T10:36:00Z

Thanks for your feedback. I can't test this right now because openml.org seems to be down (I'm getting a 404 error when downloading MNIST). I'll try again asap.

ShindeShivam added 3 commits June 30, 2025 19:40

Fix: Avoid pre-fitting final_estimator in StackingClassifier (Exercis…

8c64193

…e 9)

Fix wrong manual labels for representative digits in semi-supervised …

f126048

…learning example

Fix incorrect manual labels for KMeans representative digits

7af2522

ShindeShivam force-pushed the fix-wrong-mannual-labels branch from 6f60975 to 7af2522 Compare July 9, 2025 04:56

This was referenced Jul 18, 2025

Fix: Remove momentum from SGD to show standard optimizer behavior #205

Merged

Fix: Adjusted PiecewiseConstantDecay schedule to match training steps on Fashion MNIST #206

Open

Fix: Copy trained weights to MC Dropout model for correct inference #207

Closed

ShindeShivam mentioned this pull request Aug 2, 2025

Incorrect L1 Regularizer Formula #211

Closed

ShindeShivam changed the title ~~Fix wrong manual labels for KMeans representative digits (was causing ~7% accuracy) semi-supervised learning example~~ Fix manual labels for KMeans representative digits (was causing ~7% accuracy) semi-supervised learning example Aug 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix manual labels for KMeans representative digits (was causing ~7% accuracy) semi-supervised learning example #202

Fix manual labels for KMeans representative digits (was causing ~7% accuracy) semi-supervised learning example #202

Uh oh!

ShindeShivam commented Jul 9, 2025 •

edited

Loading

Uh oh!

ageron commented Aug 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix manual labels for KMeans representative digits (was causing ~7% accuracy) semi-supervised learning example #202

Are you sure you want to change the base?

Fix manual labels for KMeans representative digits (was causing ~7% accuracy) semi-supervised learning example #202

Uh oh!

Conversation

ShindeShivam commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

After digging into the code, I found the problem:

Fix:

Note:

Uh oh!

ageron commented Aug 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShindeShivam commented Jul 9, 2025 •

edited

Loading