(Expert in Information Science) Lecture on Natural Language Processing
-
One definition of a language model: evaluating the “plausibility” of a sentence
- Can also be used to select one from several recognition results, such as in speech recognition
-
Probabilistic language model
- Mathematical representation of a “sentence”
- Sentence s =
hello worldrepresents the beginning/end of a sentence
- Sentence s =
- Using the above, evaluate the plausibility of a sentence
- P(a|b) is the probability of word a occurring after b
- P(
) * P(hello |) * P(world |hello) * P(|hello world) - Evaluate how likely each word is based on the preceding context
- What to do with P(a|b)
- Maximum likelihood estimation
- Easily calculated using the frequency of occurrence in a corpus
- Weak for low-frequency phenomena
- If it returns 0, the product of P(a|b) becomes 0
- Approximation using n-gram
- Use only the previous n words instead of all preceding words for maximum likelihood estimation
- The smaller the value of n, the stronger it is for low-frequency items
- The larger the value of n, the more it can consider longer contexts
- Trade-Off
- Machine translation is commonly done up to n=4 (4-gram)
- Estimation using Neural Network
- Feed it into an RNN
- Maximum likelihood estimation
- Mathematical representation of a “sentence”
-
Language models retain information about the connections between words when measuring plausibility, etc.
- In other words, language models can be defined as encoding/decoding text, etc. into vectors?
-
Language Model using Neural Network
- Feed it into an RNN for embedding (encoding)
- Output is normalized to 0~1 using Softmax
- Attention Mechanism
- When the sentence becomes long, the influence of each word on the output vector becomes small
- The size of the output vector is fixed
- Calculate the weight of attention to strongly reflect important words
- When the sentence becomes long, the influence of each word on the output vector becomes small
- Transformer
- Eliminate the recursion of RNN and encode/decode only with attention mechanism
- If encoding into vectors and decoding in another language are possible, machine translation can be done
- Feed it into an RNN for embedding (encoding)
-
GPT-3, BERT, etc. are applications of Transformer
-
Pre-trained language models
- Language models that can be adapted to various tasks
- Large models have maintenance costs and are difficult to handle
- This is a disadvantage compared to dedicated small models
- Lightweight models (such as DistillBERT) have also been developed
-
Why Human Language Ability is More Amazing Than Language Models