What is Attention?
-
Inferring where to focus on the data
-
Input: a context and a set of data for which attention needs to be evaluated
- Train each non-linear function to input (context + one data) and output the importance score
-
Output: the relative importance of each data
-
Until now, this has been done by combining it with methods like CNN or LSTM
- Does Attention is all you need say that there is no need to combine it at all?