feat(model): Add FFT, OmniNet, custom DataLoader, Windows-Support #22

ClashLuke · 2021-11-14T11:59:49Z

This PR also

adds a new custom sum-based attention
changes a bunch of parameter names
changes small.yaml to integrate omnidirectional attention
breaks up our linear attention module into one ff and one attention module
removes DeepSpeed's broken CPUAdam
enforces full attention while removing autoregressive attention

The idea of FFT-based attention comes from FNet, LMU and On Learning the Transformer Kernel, but is implemented differently to optimize the expressivity of our model.
OmniNet attends to all previous hidden states instead of only the current hidden state, bridging the gap between linear attention and full attention.
A custom data loader is required as PyTorch's data loader gives CPU-OOMs, has a broken shuffling function and requires >8GiB RAM to instantiate 12 empty classes. While this wasn't the case in PyTorch 1.9, it is in 1.10 on WSL.
As WSL cannot deallocate GPU memory, we had to support windows natively.

ClashLuke added 30 commits October 31, 2021 16:27

feat(model): add fft attention

1efeab8

feat(model): add sum-based attention

0e7ca86

fix(model): define f()

3089fa6

feat(model): add omnidirectional, pyramidal attention

ed89196

feat(model): allow selection of attentio

21d80f9

fix(model): pad input with zeros if omni

59f55fd

fix(dataset): use custom multiprocessing to avoid data replication

3ae35bf

style(model): remove debug prints

0eac903

fix(dataset): manually slice pytorch data loader

20ce54b

fix(dataset): manually slice dataset

f073f7c

perf(dataset): implement sampling in multiprocessing

e8383e7

fix(model): take mean of states in omninet case

2734094

style(model): increase size/use omninet

d8db018

fix(model): remove deepspeed/add windows support

0003ec4

fix(model): use pytorch onecycle

bd437d2

feat(modeol): add input dropout

e7497fe

fix(dataset): don't use rand_like(long_tensor)

2e12178

fix(dataset): specify mul/mask order

fcf7716

fix(model): first reduce, then output

4cd79ea

perf(model): increase features to better utilize gpu

9202f57

style(model): add accuracy

439b7cd

fix(train): cast accuracy to float

f6f3971

fix(train): argmax accuracy accross correct dimension

6cea58a

fix(train): zero accuracy after log

0269ce3

feat(model): add omninet to all attention modules

4d66347

fix(model): put padding in correct device

6c5fc74

fix(model): invert omninet expansion

a138f8e

style(dataclass): remove weight decay by default

7e28e97

perf(model): only backprop masked tokens

d2fcbdc

style(model): show parameters in representation

28d2533

ClashLuke and others added 5 commits November 15, 2021 10:33

fix(model): use previous output when calculating logits

91c33e5

fix(model): take both momentumnet sides

0e1cfa3

import fix/ style edits

0ba2df1

Merge remote-tracking branch 'origin/fft' into fft

a85e193

feat(model): add multi-head self-attention

efca91f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(model): Add FFT, OmniNet, custom DataLoader, Windows-Support #22

feat(model): Add FFT, OmniNet, custom DataLoader, Windows-Support #22

Uh oh!

ClashLuke commented Nov 14, 2021

Uh oh!

Uh oh!

feat(model): Add FFT, OmniNet, custom DataLoader, Windows-Support #22

Are you sure you want to change the base?

feat(model): Add FFT, OmniNet, custom DataLoader, Windows-Support #22

Uh oh!

Conversation

ClashLuke commented Nov 14, 2021

Uh oh!

Uh oh!