You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
Copy file name to clipboardExpand all lines: docs/new_problem.md
+32-8Lines changed: 32 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,9 +15,17 @@ Let's add a new dataset together and train the transformer model. We'll be learn
15
15
16
16
For each problem we want to tackle we create a new problem class and register it. Let's call our problem `Word2def`.
17
17
18
-
Since many text2text problems share similar methods, there's already a class called [`Text2TextProblem`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py#L354) that extends the base problem class, `Problem` (both found in [`problem.py`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py)).
19
-
20
-
For our problem, we can go ahead and create the file `word2def.py` in the [`data_generators`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/) folder and add our new problem, `Word2def`, which extends [`Text2TextProblem`](https://github.com/tensorflow/tensor2tensor/blob/24071ba07d5a14c170044c5e60a24bda8179fb7a/tensor2tensor/data_generators/problem.py#L354). Let's also register it while we're at it so we can specify the problem through flags.
18
+
Since many text2text problems share similar methods, there's already a class
19
+
called `Text2TextProblem` that extends the base problem class, `Problem`
Let's also register it while we're at it so we can specify the problem through
28
+
flags.
21
29
22
30
```python
23
31
@registry.register_problem
@@ -28,7 +36,9 @@ class Word2def(problem.Text2TextProblem):
28
36
...
29
37
```
30
38
31
-
We need to implement the following methods from [`Text2TextProblem`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py#L354) in our new class:
SpaceIDs tell Tensor2Tensor what sort of space the input and target tensors are in. These are things like, EN_CHR (English character), EN_TOK (English token), AUDIO_WAV (audio waveform), IMAGE, DNA (genetic bases). The complete list can be found at [`data_generators/problem.py`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py) in the class `SpaceID`.
55
+
SpaceIDs tell Tensor2Tensor what sort of space the input and target tensors are
56
+
in. These are things like, EN_CHR (English character), EN_TOK (English token),
57
+
AUDIO_WAV (audio waveform), IMAGE, DNA (genetic bases). The complete list can be
Since we're generating definitions and feeding in words at the character level, we set `is_character_level` to true, and use the same SpaceID, EN_CHR, for both input and target. Additionally, since we aren't using tokens, we don't need to give a `targeted_vocab_size` or define `use_subword_tokenizer`.
48
63
@@ -58,7 +73,7 @@ The number of shards to break data files into.
58
73
@registry.register_problem()
59
74
classWord2def(problem.Text2TextProblem):
60
75
"""Problem spec for English word to dictionary definition."""
61
-
76
+
62
77
@property
63
78
defis_character_level(self):
64
79
returnTrue
@@ -86,7 +101,15 @@ class Word2def(problem.Text2TextProblem):
86
101
87
102
**generator**:
88
103
89
-
We're almost done. `generator` generates the training and evaluation data and stores them in files like "word2def_train.lang1" in your DATA_DIR. Thankfully several commonly used methods like `character_generator`, and `token_generator` are already written in the file [`wmt.py`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/wmt.py). We will import `character_generator` and [`text_encoder`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/text_encoder.py) to write:
104
+
We're almost done. `generator` generates the training and evaluation data and
105
+
stores them in files like "word2def_train.lang1" in your DATA_DIR. Thankfully
106
+
several commonly used methods like `character_generator`, and `token_generator`
""" Problem definition for word to dictionary definition.
157
181
"""
@@ -210,7 +234,7 @@ class Word2def(problem.Text2TextProblem):
210
234
```
211
235
212
236
# Hyperparameters
213
-
All hyperparamters inherit from `_default_hparams()` in `problem.py.` If you would like to customize your hyperparameters, register a new hyperparameter set in `word2def.py` like the example provided in the walkthrough. For example:
237
+
All hyperparamters inherit from `_default_hparams()` in `problem.py.` If you would like to customize your hyperparameters, register a new hyperparameter set in `word2def.py` like the example provided in the walkthrough. For example:
0 commit comments