https://www.youtube.com/watch?v=kCc8FmEb1nY
When there is data for block size + 1 token, each data from 11, 12, … 1~block size can be used as input, enabling training for block size times.
- This also becomes the context length.
Search
Oct 15, 2024, 1 min read
https://www.youtube.com/watch?v=kCc8FmEb1nY
When there is data for block size + 1 token, each data from 11, 12, … 1~block size can be used as input, enabling training for block size times.