Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The free-beer commercial ChatGPT or Gemini can read them and point out major errors. Larger Gemma models and huge Chinese models like full DeepSeek or Kimi K2 may work too. Sometimes the answer is odd enough that some 7B models can notice it. Technically there are no guarantee that models with same name in different sizes like Qwen 3 0.6B and 27B uses the same dataset, but it kind of tells a bit about quality and compositions of dataset that their creator owns.

I don't actually need accurate answers to those questions, it's just an expectation adjuster for me, so to speak. There should be better questions for other languages/use cases, but these seem to correlate better with model sizes and scales of companies than flappy birds.

0: https://gist.github.com/numpad0/abdf0a12ad73ada3b886d2d2edcc...

1: https://gist.github.com/numpad0/b1c37d15bb1b19809468c933faef...



Thanks for the detailed response.

I'm guessing the issue is just the model size. If you're testing sub-30B models and finding errors, well they're probably not large enough to remember everything in the training data set, so there's inaccuracies and they might hallucinate a bit regarding factoids that aren't very commonly seen in the training data.

Commercial models are presumably significantly larger than the smaller open models, so it sounds like the issue is just mainly model size...

PS: Okra on curry is pretty good actually :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: