Transformers seminar

Authors:
Konstantin A. Maslov,
Claudio Persello

This repository contains supplementary materials for the seminar on vision transformers—slides and a Jupyter notebook with a simple implementation of ViT and its training on MNIST.

Materials

Literature

Main paper for the discussion

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://doi.org/10.48550/arxiv.2010.11929

Other papers on vision transformers

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H., & Ai, F. (2020). Training data-efficient image transformers & distillation through attention. https://doi.org/10.48550/arxiv.2012.12877
Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P. H. S., & Zhang, L. (2020). Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 6877–6886. https://doi.org/10.48550/arxiv.2012.15840
Strudel, R., Garcia, R., Laptev, I., & Schmid, C. (2021). Segmenter: Transformer for Semantic Segmentation. Proceedings of the IEEE International Conference on Computer Vision, 7242–7252. https://doi.org/10.48550/arxiv.2105.05633
Ranftl, R., Bochkovskiy, A., & Koltun, V. (2021). Vision Transformers for Dense Prediction. https://doi.org/10.48550/arXiv.2103.13413
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Proceedings of the IEEE International Conference on Computer Vision, 9992–10002. https://doi.org/10.48550/arxiv.2103.14030
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Advances in Neural Information Processing Systems, 15, 12077–12090. https://doi.org/10.48550/arxiv.2105.15203

Related papers

Tolstikhin, I., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Keysers, D., Uszkoreit, J., Lucic, M., & Dosovitskiy, A. (2021). MLP-Mixer: An all-MLP Architecture for Vision. Advances in Neural Information Processing Systems, 29, 24261–24272. https://doi.org/10.48550/arxiv.2105.01601
Loshchilov, I., & Hutter, F. (2017). Decoupled Weight Decay Regularization. 7th International Conference on Learning Representations, ICLR 2019. https://doi.org/10.48550/arxiv.1711.05101
Wightman, R., Touvron, H., Jégou, H., & Ai, F. (2021). ResNet strikes back: An improved training procedure in timm. https://doi.org/10.48550/arxiv.2110.00476

If you notice any inaccuracies, mistakes or errors, feel free to submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
figures		figures
LICENSE		LICENSE
README.md		README.md
ViT demonstration.ipynb		ViT demonstration.ipynb
environment.yml		environment.yml
slides.pdf		slides.pdf
vit_toy.csv		vit_toy.csv
vit_toy.h5		vit_toy.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformers seminar

Materials

Literature

Main paper for the discussion

Other papers on vision transformers

Related papers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transformers seminar

Materials

Literature

Main paper for the discussion

Other papers on vision transformers

Related papers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages