Authors:
Konstantin A. Maslov,
Claudio Persello
This repository contains supplementary materials for the seminar on vision transformers—slides and a Jupyter notebook with a simple implementation of ViT and its training on MNIST.
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://doi.org/10.48550/arxiv.2010.11929
- Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H., & Ai, F. (2020). Training data-efficient image transformers & distillation through attention. https://doi.org/10.48550/arxiv.2012.12877
- Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P. H. S., & Zhang, L. (2020). Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 6877–6886. https://doi.org/10.48550/arxiv.2012.15840
- Strudel, R., Garcia, R., Laptev, I., & Schmid, C. (2021). Segmenter: Transformer for Semantic Segmentation. Proceedings of the IEEE International Conference on Computer Vision, 7242–7252. https://doi.org/10.48550/arxiv.2105.05633
- Ranftl, R., Bochkovskiy, A., & Koltun, V. (2021). Vision Transformers for Dense Prediction. https://doi.org/10.48550/arXiv.2103.13413
- Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Proceedings of the IEEE International Conference on Computer Vision, 9992–10002. https://doi.org/10.48550/arxiv.2103.14030
- Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Advances in Neural Information Processing Systems, 15, 12077–12090. https://doi.org/10.48550/arxiv.2105.15203
- Tolstikhin, I., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Keysers, D., Uszkoreit, J., Lucic, M., & Dosovitskiy, A. (2021). MLP-Mixer: An all-MLP Architecture for Vision. Advances in Neural Information Processing Systems, 29, 24261–24272. https://doi.org/10.48550/arxiv.2105.01601
- Loshchilov, I., & Hutter, F. (2017). Decoupled Weight Decay Regularization. 7th International Conference on Learning Representations, ICLR 2019. https://doi.org/10.48550/arxiv.1711.05101
- Wightman, R., Touvron, H., Jégou, H., & Ai, F. (2021). ResNet strikes back: An improved training procedure in timm. https://doi.org/10.48550/arxiv.2110.00476
If you notice any inaccuracies, mistakes or errors, feel free to submit a pull request.