From /villagepump/Comparing_Cheap_Open_Models_and_GPT from 03 Comparing Cheap Open Models and GPT
- I would like to receive feedback (blu3mo)(blu3mo)(blu3mo)
- It’s not necessary to go through all the samples, so even if you just look at the parts that interest you, it would be extremely helpful (blu3mo)(blu3mo)(blu3mo)
- Type A samples
- /shokai/Hierarchical_Wiki_does_not_scale https://www.fractal-reader.com/view/2fead1d8-79e2-4109-8a2b-2f98b6f00a02
- Summary of protests in Colombia (News) https://www.fractal-reader.com/view/7f8ce9f2-265a-43cc-aaf8-f517c3a9b0d6
- History of East Asia (Wikipedia) https://www.fractal-reader.com/view/d84d3632-979f-452d-9542-3c6e05c5034c
- Type B samples
- /shokai/Hierarchical_Wiki_does_not_scale https://fractal-reader.com/view/153aa829-3ec5-4184-9246-b276cd67df0f
- Summary of protests in Colombia (News) https://www.fractal-reader.com/view/c54d3cd4-8318-4a41-8d1d-d2d243fddb09
- History of East Asia (Wikipedia) https://fractal-reader.com/view/371980c8-99f1-4bfc-9548-20b48f9b86f5
- One side uses traditional gpt-4 + gpt-3.5, while the other side uses qwen1.5 + llama 3
- I will keep which model corresponds to which side a secret until tomorrow
Vote
- Type A is better (/villagepump/cFQ2f7LRuLYP)(/villagepump/takker)(/villagepump/cak)(/villagepump/sta)(/villagepump/基素)(/villagepump/bsahd)(/villagepump/nishio)
- Type B is better
- Did not feel a difference
- Felt a difference but had trouble deciding which one is better
Impressions
- I would like to know which summary you prefer, what differences you noticed, etc. (/villagepump/blu3mo)
- I will obscure the reasons using frosted glass notation (/villagepump/cFQ2f7LRuLYP)
- I used frosted glass to avoid influencing the preconceptions of others
- Thank you (/villagepump/blu3mo)
- Since the answer has been found, I will remove the frosted glass ~ (/villagepump/cFQ2f7LRuLYP)
- I used frosted glass to avoid influencing the preconceptions of others
- Read everything (/villagepump/cFQ2f7LRuLYP)
- In Type B’s summary Level 3, there are unrelated Chinese and Korean texts, which feel like noise, making me lean towards using A (/villagepump/cFQ2f7LRuLYP)
- Oh, where are they mixed in the samples? (/villagepump/blu3mo)
- I couldn’t find them when I roughly checked visually (/villagepump/blu3mo)
- 03#66348ed179e11300001b73e6|66348ed179e11300001b73e6]
- The part in Level 3 of the original text where “dealing with this kind of thing on an ad hoc basis increases implicit rules”
- Also, after the section “Linking between pages” in Level 3, there is Chinese text seven levels down
- 03#66348b2379e113000047af2a|66348b2379e113000047af2a]
- After “48 hours” on April 23 (/villagepump/cFQ2f7LRuLYP)
- Oh, indeed, there are simplified characters and Hangul mixed in the middle of the sentences (/villagepump/blu3mo)
- Thank you for pointing that out (/villagepump/blu3mo)
- (/villagepump/cFQ2f7LRuLYP)👍️- The second paragraph of the summary level 1 of B’s East Asian history is significantly longer compared to A’s. From the perspective of “East Asian history,” A seems to have a more macroscopic view, which is considered better. Despite repeatedly emphasizing ”one sentence,” B keeps responding with long texts, making it difficult to control the length of the responses.
- Oh, where are they mixed in the samples? (/villagepump/blu3mo)
- In Type B’s summary Level 3, there are unrelated Chinese and Korean texts, which feel like noise, making me lean towards using A (/villagepump/cFQ2f7LRuLYP)
- (Yota) likes the summary of ”Adding 1 and 2 leads to destruction” from May 3, 2024. Confused by the differences between A and B, where B’s text contains various angles within a single sentence, making it harder to understand. A, on the other hand, has stronger connections separated by punctuation, making it easier to comprehend.
- After reading only the section of an article by [Shokai], it was unclear why there was a need to obscure the text, so it was suggested to write it as it is unless there is a specific reason. The font in A was easier to read, with shorter sentences, while B, despite having some advantages, had difficulty maintaining a consistent quality level due to inaccuracies in the Level 3 summary. However, there was a noticeable difference in quality between A and B.
- [Nishio] felt that B was overly complicated compared to the concise A. Deciding which is better is not straightforward, as the simplicity of A may indicate information loss. Summarizing involves discarding information, making it a subjective choice between AI and personal preferences. When given options, it can be challenging to decide, but if only one option is available, it is likely to be used, suggesting to opt for the cheaper option.
- Feedback received was very helpful. Gratitude was expressed.
- It is noted that A uses GPT while B uses an open model. When using the open model in English, the issues mentioned above do not arise.
- Considerations are made for the Japanese version. While it seems possible to resolve issues with the open model using some tricks, it might be cumbersome. This raises concerns about AI services for minority languages becoming more expensive.- https://prtimes.jp/main/html/rd/p/000000057.000038247.html
- I wonder if it would be beneficial to use a Japanese fine-tuned version of Qwen1.5 like this.
- Deploying this seems like it could be costly.
- For now, letting the open model handle it and then implementing a system to clean up any issues with gpt-4-turbo might be a good idea.
- It’s easy to detect things like “mixed in Chinese.”