From Fractal Reader Development and Operation Diary
I want to achieve appropriate sentence segmentation.
- For example, if there is a change in the topic within a sentence, I would like the summary to be divided at that point.
- It is currently quite difficult to read materials that are structured like chapters, such as academic papers.
- If it’s a paper in my field, I am familiar with the template structure of chapters, so I think using chapter information would make it easier to read.
- It is currently quite difficult to read materials that are structured like chapters, such as academic papers.
I see (nishio).
-
If limited to Plurality Book, parsing the Markdown data seems like a good approach.
-
As a general rule, the appearance of “short lines that are not sentences” seems to be a hint.
- However, there are many pitfalls such as bullet points, tables, and figure captions.
-
If cost is not a concern, I would like to hand it over to a contextually broad Large Language Model (LLM) for segmentation.
- I think this will work (blu3mo).
- Either way, since I am repeatedly inputting the entire text, the cost doesn’t change much.
- I think this will work (blu3mo).
-
Design in consideration (blu3mo)
- Input into LLM, have it divided nicely at a macro level once
- If there are chapters, I want it to follow that, and if not, I want it to be split nicely based on the context.
- At a summary level higher in detail than that division, I want chunks to be separated at this division point.
- At a summary level lower in detail than that division, continue as before.
- Input into LLM, have it divided nicely at a macro level once