Skip to content

Commit 249b1f8

Browse files
committed
🚀 Supported ContextNet
1 parent e44d560 commit 249b1f8

31 files changed

+1501
-135
lines changed

README.md

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,12 @@
1616
</h2>
1717

1818
<p align="center">
19-
TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:
19+
TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:
2020
</p>
2121

2222
## What's New?
2323

24+
- (12/17/2020) Supported ContextNet [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191)
2425
- (12/12/2020) Add support for using masking
2526
- (11/14/2020) Supported Gradient Accumulation for Training in Larger Batch Size
2627
- (11/3/2020) Reduce differences between `librosa.stft` and `tf.signal.stft`
@@ -34,6 +35,8 @@ TensorFlowASR implements some automatic speech recognition architectures such as
3435
- [What's New?](#whats-new)
3536
- [Table of Contents](#table-of-contents)
3637
- [:yum: Supported Models](#yum-supported-models)
38+
- [Baselines](#baselines)
39+
- [Publications](#publications)
3740
- [Installation](#installation)
3841
- [Installing via PyPi](#installing-via-pypi)
3942
- [Installing from source](#installing-from-source)
@@ -47,21 +50,29 @@ TensorFlowASR implements some automatic speech recognition architectures such as
4750
- [Vietnamese](#vietnamese)
4851
- [German](#german)
4952
- [References & Credits](#references--credits)
53+
- [Contact](#contact)
5054

5155
<!-- /TOC -->
5256

5357
## :yum: Supported Models
5458

59+
### Baselines
60+
5561
- **CTCModel** (End2end models using CTC Loss for training)
62+
- **Transducer Models** (End2end models using RNNT Loss for training)
63+
64+
### Publications
65+
5666
- **Deep Speech 2** (Reference: [https://arxiv.org/abs/1512.02595](https://arxiv.org/abs/1512.02595))
5767
See [examples/deepspeech2](./examples/deepspeech2)
5868
- **Jasper** (Reference: [https://arxiv.org/abs/1904.03288](https://arxiv.org/abs/1904.03288))
5969
See [examples/jasper](./examples/jasper)
60-
- **Transducer Models** (End2end models using RNNT Loss for training)
6170
- **Conformer Transducer** (Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100))
6271
See [examples/conformer](./examples/conformer)
6372
- **Streaming Transducer** (Reference: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621))
6473
See [examples/streaming_transducer](./examples/streaming_transducer)
74+
- **ContextNet** (Reference: [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191))
75+
See [examples/contextnet](./examples/contextnet)
6576

6677
## Installation
6778

@@ -104,6 +115,8 @@ python setup.py install
104115

105116
- For _enabling XLA_, run `TF_XLA_FLAGS=--tf_xla_auto_jit=2 python3 $path_to_py_script`)
106117

118+
- For _hiding warnings_, run `export TF_CPP_MIN_LOG_LEVEL=2` before running any examples
119+
107120
## TFLite Convertion
108121

109122
After converting to tflite, the tflite model is like a function that transforms directly from an **audio signal** to **unicode code points**, then we can convert unicode points to string.
@@ -199,3 +212,10 @@ For pretrained models, go to [drive](https://drive.google.com/drive/folders/1BD0
199212
2. [https://github.com/noahchalifour/warp-transducer](https://github.com/noahchalifour/warp-transducer)
200213
3. [Sequence Transduction with Recurrent Neural Network](https://arxiv.org/abs/1211.3711)
201214
4. [End-to-End Speech Processing Toolkit in PyTorch](https://github.com/espnet/espnet)
215+
5. [https://github.com/iankur/ContextNet](https://github.com/iankur/ContextNet)
216+
217+
## Contact
218+
219+
Huy Le Nguyen
220+
221+

examples/conformer/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -84,11 +84,11 @@ learning_config:
8484
8585
## Usage
8686
87-
Training, see `python examples/conformer/train_conformer.py --help`
87+
Training, see `python examples/conformer/train_*.py --help`
8888

89-
Testing, see `python examples/conformer/train_conformer.py --help`
89+
Testing, see `python examples/conformer/test_*.py --help`
9090

91-
TFLite Conversion, see `python examples/conformer/tflite_conformer.py --help`
91+
TFLite Conversion, see `python examples/conformer/tflite_*.py --help`
9292

9393
## Conformer Subwords - Results on LibriSpeech
9494

examples/conformer/config.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ model_config:
4848
encoder_fc_factor: 0.5
4949
encoder_dropout: 0.1
5050
prediction_embed_dim: 320
51-
prediction_embed_dropout: 0.1
51+
prediction_embed_dropout: 0
5252
prediction_num_rnns: 1
5353
prediction_rnn_units: 320
5454
prediction_rnn_type: lstm
@@ -70,12 +70,12 @@ learning_config:
7070

7171
dataset_config:
7272
train_paths:
73-
- /mnt/d/Datasets/Speech/LibriSpeech/train-clean-100/transcripts.tsv
73+
- /mnt/Miscellanea/Datasets/Speech/LibriSpeech/train-clean-100/transcripts.tsv
7474
eval_paths:
75-
- /mnt/d/Datasets/Speech/LibriSpeech/dev-clean/transcripts.tsv
76-
- /mnt/d/Datasets/Speech/LibriSpeech/dev-other/transcripts.tsv
75+
- /mnt/Miscellanea/Datasets/Speech/LibriSpeech/dev-clean/transcripts.tsv
76+
- /mnt/Miscellanea/Datasets/Speech/LibriSpeech/dev-other/transcripts.tsv
7777
test_paths:
78-
- /mnt/d/Datasets/Speech/LibriSpeech/test-clean/transcripts.tsv
78+
- /mnt/Miscellanea/Datasets/Speech/LibriSpeech/test-clean/transcripts.tsv
7979
tfrecords_dir: null
8080

8181
optimizer_config:
@@ -88,7 +88,7 @@ learning_config:
8888
batch_size: 2
8989
accumulation_steps: 4
9090
num_epochs: 20
91-
outdir: /mnt/d/Models/local/conformer
91+
outdir: /mnt/Miscellanea/Models/local/conformer
9292
log_interval_steps: 300
9393
eval_interval_steps: 500
9494
save_interval_steps: 1000

examples/conformer/train_conformer.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,9 +113,9 @@
113113
optimizer_config = config.learning_config.optimizer_config
114114
optimizer = tf.keras.optimizers.Adam(
115115
TransformerSchedule(
116-
d_model=config.model_config["encoder_dmodel"],
116+
d_model=conformer.dmodel,
117117
warmup_steps=optimizer_config["warmup_steps"],
118-
max_lr=(0.05 / math.sqrt(config.model_config["encoder_dmodel"]))
118+
max_lr=(0.05 / math.sqrt(conformer.dmodel))
119119
),
120120
beta_1=optimizer_config["beta1"],
121121
beta_2=optimizer_config["beta2"],

examples/conformer/train_ga_conformer.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,9 +115,9 @@
115115

116116
optimizer = tf.keras.optimizers.Adam(
117117
TransformerSchedule(
118-
d_model=config.model_config["encoder_dmodel"],
118+
d_model=conformer.dmodel,
119119
warmup_steps=config.learning_config.optimizer_config["warmup_steps"],
120-
max_lr=(0.05 / math.sqrt(config.model_config["encoder_dmodel"]))
120+
max_lr=(0.05 / math.sqrt(conformer.dmodel))
121121
),
122122
beta_1=config.learning_config.optimizer_config["beta1"],
123123
beta_2=config.learning_config.optimizer_config["beta2"],

examples/conformer/train_ga_subword_conformer.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,9 +131,9 @@
131131

132132
optimizer = tf.keras.optimizers.Adam(
133133
TransformerSchedule(
134-
d_model=config.model_config["encoder_dmodel"],
134+
d_model=conformer.dmodel,
135135
warmup_steps=config.learning_config.optimizer_config["warmup_steps"],
136-
max_lr=(0.05 / math.sqrt(config.model_config["encoder_dmodel"]))
136+
max_lr=(0.05 / math.sqrt(conformer.dmodel))
137137
),
138138
beta_1=config.learning_config.optimizer_config["beta1"],
139139
beta_2=config.learning_config.optimizer_config["beta2"],

examples/conformer/train_subword_conformer.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -128,9 +128,9 @@
128128

129129
optimizer = tf.keras.optimizers.Adam(
130130
TransformerSchedule(
131-
d_model=config.model_config["encoder_dmodel"],
131+
d_model=conformer.dmodel,
132132
warmup_steps=config.learning_config.optimizer_config["warmup_steps"],
133-
max_lr=(0.05 / math.sqrt(config.model_config["encoder_dmodel"]))
133+
max_lr=(0.05 / math.sqrt(conformer.dmodel))
134134
),
135135
beta_1=config.learning_config.optimizer_config["beta1"],
136136
beta_2=config.learning_config.optimizer_config["beta2"],

0 commit comments

Comments
 (0)