https://www.youtube.com/watch?v=kCc8FmEb1nY
- It’s possible to convert a string into an array of token IDs.
- When there is data consisting of block size + 1 tokens, each data from 1 to block size can be used as input for training, allowing training to be conducted block size times.
- This also becomes the context length.
- By doing this, it is possible to train for next token prediction with both short and long context lengths.
 
- Bigram Neural Network
- Trained only on pairs of i-th token and i+1-th token.
- Predictions are made using a table of size vocab size x vocab size.
- Referring to the row of each vocab, the probability of the next token is written there.
 
- This is trained using a Neural Network.
 
- Predictions are made using a table of size vocab size x vocab size.
 
- Trained only on pairs of i-th token and i+1-th token.