-
Embedding the transcription of lecture notes and lecture videos using Flair and BERT.
-
Comparing the embedded vectors using an NN Classifier.
- Initially tried using Cosine Distance.
- Cosine Distance is not good when the dimension is high.
-
Data:
- Summarized notes of TED Talks found online.
-
Linking process:
- For now, linking is done to the beginning of the note if it spans multiple sentences.
- Linked every other note to http://www.kevinhabits.com/ted/.
-
NN:
- Eventually, time data will be incorporated.
-
Potential future challenges:
- Is it sufficient to link only to the beginning of the note?
- Are there any issues with using very long or very short sentences as they are?
- Very long sentences are usually continuous speech.
- Is it enough to just apply BERT?
- Are there any biases in the selection process (e.g. video length)?
-
Thoughts:
- Writing the desired parts of the video in natural language can generate subtitles.
- Can we learn that “XXX” indicates a quote? (Might be difficult with BERT)
- Can we learn what should and should not be taken in notes? #NaturalLanguageProcessing #Minerva