Fix missing parentheses in TF Adam/Yogi update denominator#2691
Open
Chessing234 wants to merge 1 commit intod2l-ai:masterfrom
Open
Fix missing parentheses in TF Adam/Yogi update denominator#2691Chessing234 wants to merge 1 commit intod2l-ai:masterfrom
Chessing234 wants to merge 1 commit intod2l-ai:masterfrom
Conversation
In the TensorFlow implementations of adam() and yogi() in
chapter_optimization/adam.md, the parameter update is written as
p - lr * v_bias_corr / tf.math.sqrt(s_bias_corr) + eps
Due to Python operator precedence this evaluates to
(p - lr * v_bias_corr / tf.math.sqrt(s_bias_corr)) + eps
so 'eps' is added to the parameter instead of being added to the
denominator for numerical stability. The math given in the text and the
paired PyTorch implementations correctly use
p - lr * v_bias_corr / (tf.math.sqrt(s_bias_corr) + eps)
Add the missing parentheses so the TF code matches the formula.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
In
chapter_optimization/adam.md, the TensorFlow implementations ofadam()andyogi()write the parameter update as:Root cause
Python operator precedence parses
a / b + cas(a / b) + c, so this expression is:That is,
epsis added directly to the parameter update, instead of being added to the denominator for numerical stability.The prose in this section and the paired PyTorch implementations a few lines above (for both
adamandyogi) correctly use:So this is a simple missing-parentheses bug in the TF code blocks only.
Fix
Wrap the denominator in parentheses so
epsis inside thesqrt(...) + epsterm, matching the formula and the PyTorch tabs:Two call sites updated (
adam,yogi), 4 lines changed.