Energy-Based-Models-for-Continual-Learning/index.html at main · energy-based-model/Energy-Based-Models-for-Continual-Learning · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <meta name="description"
          content="Energy-Based Models for Continual Learning">
    <meta name="author"
          content="Yilun Du, Shuang Li, Joshua B. Tenenbaum, Igor Mordatch">

    <title>Energy-Based Models for Continual Learning</title>
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css"
          integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">

    <!-- Custom styles for this template -->
    <link href="offcanvas.css" rel="stylesheet">
    <!--    <link rel="icon" href="img/favicon.gif" type="image/gif">-->
</head>

<body>
<div class="jumbotron jumbotron-fluid">
    <div class="container"></div>
    <h2>Energy-Based Models for Continual Learning</h2>
    <!-- <h3>ICML 2021</h3> -->
    <hr>
    <p class="authors">
        <a href="https://shuangli59.github.io/"> Shuang Li<sup>1</sup></a>,
        <a href="https://yilundu.github.io">Yilun Du<sup>1</sup></a>,
        <a href="https://www.bcm.edu/people-search/gido-van-de-ven-32297">Gido M. van de Ven<sup>2</sup></a>,
        <a href="https://scholar.google.com/citations?user=Vzr1RukAAAAJ&hl=en">Igor Mordatch<sup>3</sup></a>
    </p>

    <p class="institution">
      <sup>1</sup> MIT CSAIL&nbsp;&nbsp;&nbsp;
      <sup>2</sup> Baylor College of Medicine&nbsp;&nbsp;&nbsp;
      <sup>3</sup> Google Brain
    </p>


    <div class="btn-group" role="group" aria-label="Top menu">
        <a class="btn btn-primary" href="https://arxiv.org/pdf/2011.12216.pdf">Paper</a>
        <a class="btn btn-primary" href="https://github.com/ShuangLI59/ebm-continual-learning">Code</a>
    </div>
</div>


<div class="container">
    <div class="section">
        <p>
            We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs change the underlying training objective to causes less interference with previously learned information. Our proposed version of EBMs for continual learning is simple, efficient, and outperforms baseline methods by a large margin on several benchmarks. Moreover, our proposed contrastive divergence based training objective can be applied to other continual learning methods, resulting in substantial boosts in their performance. We further show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. These observations point towards EBMs as a useful building block for future continual learning methods.
        </p>
    </div>

    <div class="section">
        <h2>Energy landmap</h2>
        <hr>
        <p>
            Energy landmaps of Softmax-based classifiers and EBMs after training on task T9 and T10 on permuted MNIST.
            The darker the diagonal is, the better the model is in preventing forgetting previous tasks.
		</p>

        <div class="row justify-content-center">
            <div class="col-sm-3">
            </div>
            <div class="col-sm-10">
                <img src="files/fig/landmap.png" style="width:100%">
            </div>
            <div class="col-sm-3">
            </div>
        </div>

    </div>


    <div class="section">
        <h2>Predicted label distributation</h2>
        <hr>
        <p>
            Predicted label distribution after learning each task on the split MNIST dataset.
            The Softmax-based classifier only predicts classes from the current task, while our EBM predicts classes for all seen classes
        </p>

        <div class="row justify-content-center">
            <div class="col-sm-1">
            </div>
            <div class="col-sm-6">
                <img src="files/fig/split_label_distributation.png" style="width:100%">
            </div>
            <div class="col-sm-1">
            </div>
        </div>

    </div>


    <div class="section">
        <h2>Confusion matrices</h2>
        <hr>
        <p>
            Confusion matrices between ground truth labels and predicted labels at the end of learning on split MNIST (left) and permuted MNIST (right).
            The lighter the diagonal is, the more accurate the predictions are.
        </p>

        <div class="row justify-content-center">
            <div class="col-sm-1">
            </div>
            <div class="col-sm-8">
                <img src="files/fig/split_confusion.png" style="width:100%">
            </div>
            <div class="col-sm-1">
            </div>
        </div>

    </div>


    <div class="section">
        <h2>Testing curve along training</h2>
        <hr>
        <p>
            Class-IL testing accuracy of the standard classifier (SBC), classifier using our training objective (SBC*),
            and EBMs on each task on the split MNIST dataset (left) and permuted MNIST dataset (right).
        </p>

        <div class="row justify-content-center">
            <div class="col-sm-3">
            </div>
            <div class="col-sm-14">
                <img src="files/fig/testing_curve.png" style="width:100%">
            </div>
            <div class="col-sm-3">
            </div>
        </div>

    </div>


     <div class="section">
        <h2>Boundary-aware setting</h2>
        <hr>
        <p>
            Evaluation of class-incremental learning on the boundary-aware setting. Test accuracy on four datasets is reported.
            Each experiment is performed at least 10 times with different random seeds, the results are reported as the mean/SEM over these runs.
            Note our comparison is restricted to methods that do not replay stored or generated data.
        </p>

        <div class="row justify-content-center">
            <div class="col-sm-3">
            </div>
            <div class="col-sm-6">
                <img src="files/fig/boundary_aware.png" style="width:100%">
            </div>
            <div class="col-sm-3">
            </div>
        </div>

    </div>


    <div class="section">
        <h2>Boundary-agnostic setting</h2>
        <hr>
        <p>
            Evaluation of class-incremental learning performance on the boundary-agnostic setting.
            Each experiment is performed 5 times with different random seeds, average test accuracy is reported as the mean/SEM over these runs.
            Note that our comparison is restricted to methods that do not replay stored or generated data.
        </p>

        <div class="row justify-content-center">
            <div class="col-sm-3">
            </div>
            <div class="col-sm-6">
                <img src="files/fig/boundary_agnostic.png" style="width:100%">
            </div>
            <div class="col-sm-3">
            </div>
        </div>

    </div>


    <div class="section">
        <h2>Related Projects</h2>
        <hr>
        <p>
            Check out our related projects on utilizing energy based models! <br>
        </p>


        <div class='row vspace-top'>
            <div class="col-sm-3">
                <img src='related_works/comp_cartoon.png' class='img-fluid'>
            </div>

            <div class="col">
                <div class='paper-title'>
                    <a href="https://energy-based-model.github.io/compositional-generation-inference/">Compositional Visual Generation with Energy Based Models</a>
                </div>
                <div>
                    We show how EBMs enable <b>zero-shot compositional</b> visual generation, enabling us to compose visual concepts
                    (through operators of conjunction, disjunction, or negation) together in a zero-shot manner.
                    Our approach enables us to generate faces given a  description
                    ((Smiling AND Female) OR (NOT Smiling AND Male)) or to combine several different objects together.
                </div>
            </div>
        </div>

        <div class='row vspace-top'>
            <div class="col-sm-3">
                <video width="100%" playsinline="" autoplay="" loop="" preload="" muted="">
                    <source src="files/uncond_gen_half.mp4" type="video/mp4">
                </video>
            </div>

            <div class="col">
                <div class='paper-title'>
                    <a href="https://arxiv.org/abs/2012.01316">Improved Contrastive Divergence Training of Energy Based Models</a>
                </div>
                <div>
                    We show that the traditional contrastive divergence training objective used to train EBMs
                    is omits a important gradient term. We propose a loss to represent this missing gradient
                    and propose additional tricks to improve EBM training. We show that our resultant models
                    are able to generate high resolutions images and are further able to compose with each
                    other.
                </div>
            </div>
        </div>


        <div class='row vspace-top'>
            <div class="col-sm-3">
                <img src='related_works/protein.png' class='img-fluid'>
            </div>

            <div class="col">
                <div class='paper-title'>
                    <a href="https://arxiv.org/abs/2004.13167">Energy Based Models for Atomic Level Protein Conformations</a>

                </div>
                <div>
                    We introduce EBMs for modeling the underlying energy landscape of atomic level protein conformations. We train
                    EBMs to predict the energy of different protein rotamer configurations, and find that our trained EBM models
                    can nearly match the performance of classical energy function Rosetta on the task of protein sidechain prediction.
                </div>
            </div>
        </div>

        <div class='row vspace-top'>
            <div class="col-sm-3">
                <img src='related_works/ebm_plan.png' class='img-fluid'>
            </div>

            <div class="col">
                <div class='paper-title'>
                    <a href="https://arxiv.org/abs/2012.01316">Model Based Planning with Energy Based Models</a>
                </div>
                <div>
                    We present a framework towards utilizing EBMs to learn, in an online fashion, trajectory level plans for
                    different start and goal configurations. This allows us to flexibly change and adapt to different
                    sets of goals by changing the underlying trajectory inference objective.
                </div>
            </div>
        </div>

        <div class='row vspace-top'>
            <div class="col-sm-3">
                <video width="100%" playsinline="" autoplay="" loop="" preload="" muted="">
                    <source src="related_works/half.mp4" type="video/mp4">
                </video>
            </div>

            <div class="col">
                <div class='paper-title'>
                    <a href="https://openai.com/blog/energy-based-models/">Implicit Generation and Generalization with Energy Based Models</a>

                </div>
                <div>
                    We introduce a method to scale EBM training to modern neural network architectures.
                    We show that such trained EBMs have a set of unique properties, enabling model robustness,
                    image and trajectory modeling, continual learning and compositional visual generation.
                </div>
            </div>
        </div>


        <div class="section">
            <h2>Paper</h2>
            <hr>
            <div>
                <div class="list-group">
                    <a href="https://arxiv.org/pdf/2011.12216.pdf"
                       class="list-group-item">
                        <img src="files/fig/paper_thumbnail.png"
                             style="width:100%; margin-right:-20px; margin-top:-10px;">
                    </a>
                </div>
            </div>
        </div>

        <div class="section">
            <h2>Bibtex</h2>
            <hr>
            <div class="bibtexsection">
                @article{li2020energy,
                  title={Energy-Based Models for Continual Learning},
                  author={Li, Shuang and Du, Yilun and van de Ven, Gido M and Mordatch, Igor},
                  journal={arXiv preprint arXiv:2011.12216},
                  year={2020}
                }
            </div>
        </div>

        <hr>

        <footer>
            <p>Send feedback and questions to <a href="https://shuangli59.github.io/">Shuang Li</a></p>
        </footer>
    </div>

</body>
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
</html>