Add a quick MNIST model for README, comments and documentation, bump back version.

Lukasz Kaiser · Ryan Sepassi · commit 08f0742b06d9 · 2018-02-11T12:50:30.000-08:00
PiperOrigin-RevId: 185311943
diff --git a/README.md b/README.md
@@ -19,7 +19,8 @@ or running existing ones on your data. It is actively used and maintained by
 researchers and engineers within
 the [Google Brain team](https://research.google.com/teams/brain/) and was used
 to develop state-of-the-art models for translation (see
-[Attention Is All You Need](https://arxiv.org/abs/1706.03762)), summarization,
+[Attention Is All You Need](https://arxiv.org/abs/1706.03762)),
+[summarization](https://arxiv.org/abs/1801.10198),
 image generation and other tasks. You can read
 more about T2T in the [Google Research Blog post introducing
 it](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html).
@@ -42,7 +43,20 @@ with T2T announcements.
 browser using a free VM from Google, no installation needed.
 
 Alternatively, here is a one-command version that installs T2T, downloads data,
-trains an English-German translation model, and evaluates it:
+trains an MNIST model and evaluates it:
+
+```
+pip install tensor2tensor && t2t-trainer \
+  --generate_data \
+  --data_dir=~/t2t_data \
+  --problems=image_mnist \
+  --model=shake_shake \
+  --hparams_set=shake_shake_quick \
+  --output_dir=~/t2t_train/mnist1
+```
+
+For a more demanding problem, here is how to train
+an English-German translation model and evaluate it:
 
 ```
 pip install tensor2tensor && t2t-trainer \
@@ -54,7 +68,7 @@ pip install tensor2tensor && t2t-trainer \
   --output_dir=~/t2t_train/base
 ```
 
-You can decode from the model interactively:
+You can decode from the model interactively to get translations:
 
 ```
 t2t-decoder \
diff --git a/docs/walkthrough.md b/docs/walkthrough.md
@@ -19,7 +19,8 @@ or running existing ones on your data. It is actively used and maintained by
 researchers and engineers within
 the [Google Brain team](https://research.google.com/teams/brain/) and was used
 to develop state-of-the-art models for translation (see
-[Attention Is All You Need](https://arxiv.org/abs/1706.03762)), summarization,
+[Attention Is All You Need](https://arxiv.org/abs/1706.03762)),
+[summarization](https://arxiv.org/abs/1801.10198),
 image generation and other tasks. You can read
 more about T2T in the [Google Research Blog post introducing
 it](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html).
@@ -42,7 +43,20 @@ with T2T announcements.
 browser using a free VM from Google, no installation needed.
 
 Alternatively, here is a one-command version that installs T2T, downloads data,
-trains an English-German translation model, and evaluates it:
+trains an MNIST model and evaluates it:
+
+```
+pip install tensor2tensor && t2t-trainer \
+  --generate_data \
+  --data_dir=~/t2t_data \
+  --problems=image_mnist \
+  --model=shake_shake \
+  --hparams_set=shake_shake_quick \
+  --output_dir=~/t2t_train/mnist1
+```
+
+For a more demanding problem, here is how to train
+an English-German translation model and evaluate it:
 
 ```
 pip install tensor2tensor && t2t-trainer \
@@ -54,7 +68,7 @@ pip install tensor2tensor && t2t-trainer \
   --output_dir=~/t2t_train/base
 ```
 
-You can decode from the model interactively:
+You can decode from the model interactively to get translations:
 
 ```
 t2t-decoder \
diff --git a/setup.py b/setup.py
@@ -5,7 +5,7 @@
 
 setup(
     name='tensor2tensor',
-    version='2.0.0',
+    version='1.5.0',
     description='Tensor2Tensor',
     author='Google Inc.',
     author_email='no-reply@google.com',
diff --git a/tensor2tensor/bin/t2t-datagen b/tensor2tensor/bin/t2t-datagen
@@ -1,5 +1,16 @@
 #!/usr/bin/env python
-"""t2t-datagen."""
+"""Data generation for Tensor2Tensor.
+
+This script is used to generate data to train your models
+for a number problems for which open-source data is available.
+
+For example, to generate data for MNIST run this:
+
+t2t-datagen \
+  --problem=image_mnist \
+  --data_dir=~/t2t_data \
+  --tmp_dir=~/t2t_data/tmp
+"""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
diff --git a/tensor2tensor/bin/t2t-trainer b/tensor2tensor/bin/t2t-trainer
@@ -1,5 +1,21 @@
 #!/usr/bin/env python
-"""t2t-trainer."""
+"""Trainer for Tensor2Tensor.
+
+This script is used to train your models in Tensor2Tensor.
+
+For example, to train a shake-shake model on MNIST run this:
+
+t2t-trainer \
+  --generate_data \
+  --problems=image_mnist \
+  --data_dir=~/t2t_data \
+  --tmp_dir=~/t2t_data/tmp
+  --model=shake_shake \
+  --hparams_set=shake_shake_quick \
+  --output_dir=~/t2t_train/mnist1 \
+  --train_steps=1000 \
+  --eval_steps=100
+"""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
diff --git a/tensor2tensor/models/shake_shake.py b/tensor2tensor/models/shake_shake.py
@@ -185,6 +185,16 @@ def shakeshake_small():
   return hparams
 
 
+@registry.register_hparams
+def shake_shake_quick():
+  hparams = shakeshake_small()
+  hparams.optimizer = "Adam"
+  hparams.learning_rate_cosine_cycle_steps = 1000
+  hparams.learning_rate = 0.5
+  hparams.batch_size = 100
+  return hparams
+
+
 @registry.register_hparams
 def shakeshake_big():
   hparams = shakeshake_small()
diff --git a/tensor2tensor/models/transformer.py b/tensor2tensor/models/transformer.py
@@ -13,10 +13,14 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-"""transformer (attention).
+"""Transformer model from "Attention Is All You Need".
 
-encoder: [Self-Attention, Feed-forward] x n
-decoder: [Self-Attention, Source-Target-Attention, Feed-forward] x n
+The Transformer model consists of an encoder and a decoder. Both are stacks
+of self-attention layers followed by feed-forward layers. This model yields
+good results on a number of problems, especially in NLP and machine translation.
+
+See "Attention Is All You Need" (https://arxiv.org/abs/1706.03762) for the full
+description of the model and the results obtained with its early version.
 """
 
 from __future__ import absolute_import