Skip to content

Commit 9be4eac

Browse files
박상현박상현
authored andcommitted
chore: several transformer models.md upload
1 parent bf3b5a9 commit 9be4eac

File tree

4 files changed

+88
-0
lines changed

4 files changed

+88
-0
lines changed
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
layout: post
3+
title: '[cAIRev] Several Transformer Models'
4+
description: Several transformer models descriptions focus on its architecture
5+
date: 2025-09-02 10:50:00 +09:00
6+
categories: [chipkkang9's AI Reversing]
7+
tags: [AI, Model, Transformer]
8+
---
9+
10+
# Transformer Architecture
11+
Transformer brought a remarkable improvement to AI studies.
12+
13+
It consists of Encoder and Decoder, known as its very first paper "Attention Is All You Need", but researchers could develop AI models by using only Encoder or only Decoder, and even both of them.
14+
15+
In this article, we'll gonna discuss the pros and cons of each Transformer-based model and the tasks each model fits. Let's look into these several Transformer Architectures.
16+
17+
## Encoder-Only Transformer
18+
![](/assets/img/posts/Encoder-only%20Transformer.png)
19+
_Encoder Part of Transformer_
20+
21+
Encoder-Only Transformer is also known as "auto-encoding" model.
22+
23+
Encoder-Only Transformer models use only Encoder part of the Transformer architecture. It `bi-directionally` conducts attention about all the input contexts, and is specialized to generate advanced semantic representation information of each word.
24+
25+
### Pretraining
26+
To pretrain Encoder-Only Transformer, in the pretraining process of the model, learning proceeds through the process of damagin a given inital sentence using various methods(e.g. word masking) and restoring the damaged sentence to the original sentence.
27+
28+
### Tasks
29+
Encoder model fits for the tasks that require understand about whole sentence, like sentence classification, named-entity recognition, and more generally word classification or extractive question answering.
30+
31+
### Example Models
32+
33+
- BERT
34+
- DistilBERT
35+
- RoBERTa
36+
37+
<br>
38+
39+
## Decoder-Only Transformer
40+
![](/assets/img/posts/Decoder-only%20Transformer.png)
41+
_Decoder Part of Transformer_
42+
43+
Decoder-Only Transformer is also known as "auto-regressive" model.
44+
45+
Decoder-Only Transformer models use only Decoder part of the Transformer architecture. For each step, attention layer could only access former positioned words about currently processing word.
46+
47+
### Pretraining
48+
To pretrain Decoder-Only Transformer, in the pretraining process of the model, learning proceeds through predicting next word of the sentence generally.
49+
50+
### Tasks
51+
Decoder model fits for the tasks like text generation, summarization, translation, question answering, code generation, reasoning, few-shot learning
52+
53+
### Example Models
54+
55+
- Hugging Face SmolLM Series
56+
- DeepSeek's V3
57+
- Meta's Llama Series
58+
59+
<br>
60+
61+
## Encoder-Decoder Transformer
62+
![](/assets/img/posts/Transformer%20Architecture.png)
63+
_Encoder-Decoder Transformer_
64+
65+
Encoder-Decoder Transformer is also known as "sequence-to-sequence" model. It might be named the architecture is similar to seq2seq model.
66+
67+
Encoder-Decoder Transformer models use both parts of Transformer architecture. For each step, attention layer could access all words to inital sentence. However, the attention layer of Decoder part could only access former positioned words about currently processing word.
68+
69+
### Pretraining
70+
To pretrain Encoder-Decoder Transformer, it might uses Encoder or Decoder model's objectives, but it requires more complex processing steps.
71+
72+
For example, `T5` model pretrains by replacing random spans of text to a mask special word, training objective is predicting correct special word that masked before. Especially, Encoder-Decoder Transformer based model pretraining method is not determined as only one, it is subject to be changed model by model.
73+
74+
### Example Models
75+
76+
| Application | Description | Example Model |
77+
|:-----------------------|-----------------------|----------------|
78+
|Machine Translation |Converting text between languages | Marian, T5
79+
|Text Summarization |Creating concise summaries of longer texts | BART, T5
80+
|Data-to-Text Generation |Converting structured data into natural language | T5
81+
|Grammar Correction |Fixing grammartical errors in text | T5
82+
|Question Answering |Generating answers based on context | BART, T5
83+
84+
<br>
85+
86+
## References.
87+
> Huggingface. Transformer Architectures
88+
: https://huggingface.co/learn/llm-course/en/chapter1/6
733 KB
Loading
731 KB
Loading
188 KB
Loading

0 commit comments

Comments
 (0)