Fix: Remove momentum from SGD to show standard optimizer behavior #205
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Chapter : 11
Cell : Faster Optimizers [47]
Changed:
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9)
To:
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)
Why change was needed:
While working through Chapter 11, I noticed that the SGD optimizer was being initialized with a momentum value.
Since this section appears to be comparing various optimizers — including plain SGD — I felt that including momentum here might unintentionally misrepresent how standard SGD behaves. Momentum definitely helps improve performance, but it has been added separately after this cell to SGD.
I removed the momentum parameter so the optimizer now reflects vanilla SGD.
**I’ve attached the optimizer loss plots from before and after the change (see images below). **
In the original version, the “SGD” curve seemed to perform better than expected — and after inspecting the code, I realized it was actually using momentum. After the fix, SGD’s curve is now visible and shows the slower convergence we typically expect from it.
Before->
After->