-
Original Paper: https://arxiv.org/abs/1706.03762
-
Proposal of the Transformer
-
Explanation of the essential Transformer in the deep learning field! - Qiita
-
Background:
-
Mechanism of Autoencoder
Slowly Explaining the “Transformer” Dominating the AI Field (Day 2) Introduction/Background
- Compared to the weak memory of RNN, Transformer has a strong memory
Slowly Explaining the “Transformer” Dominating the AI Field (Day 3)
Model Architecture 1
- Easy to understand
- Self Attention
- Inferring attention to important surrounding words for understanding a given word
- Things I don’t understand
- I don’t understand how it corresponds to the structure of the Encoder-Decoder Model
The Transformer is based on the encoder-decoder model and uses self-attention layers and position-wise fully connected layers, which are its main features. In other words, if you understand the following three (+ two) things, you can understand the model structure, so I will explain them in order:
- Encoder-Decoder Model
- Attention
- Position-wise fully connected layers
- Character embedding and softmax
- Positional encoding