Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The 120B model badly hallucinates facts on the level of a 0.6B model.

My go to test for checking hallucinations is 'Tell me about Mercantour park' (a national park in south eastern France).

Easily half of the facts are invented. Non-existing mountain summits, brown bears (no, there are none), villages that are elsewhere, wrong advice ('dogs allowed' - no they are not).



This is precisely the wrong way to think about LLMs.

LLMs are never going to have fact retrieval as a strength. Transformer models don't store their training data: they are categorically incapable of telling you where a fact comes from. They also cannot escape the laws of information theory: storing information requires bits. Storing all the world's obscure information requires quite a lot of bits.

What we want out of LLMs is large context, strong reasoning and linguistic facility. Couple these with tool use and data retrieval, and you can start to build useful systems.

From this point of view, the more of a model's total weight footprint is dedicated to "fact storage", the less desirable it is.


I think that sounds very reasonable, but unfortunately these models don’t know what they know and don’t. A small model that knew the exact limits of its knowledge would be very powerful.


Hallucinations have characteristics in interpretability studies. That's a foothold into reducing them.

They still won't store much information, but it could mean they're better able to know what they don't know.


How can you reason correctly if you don't have any way to know which facts are real vs hallucinated?


What are the large context, strong reasoning, and linguistic facility for if there aren't facts underpinning them? Is a priori wholly independent of a posteriori? Is it practical for the former to be wholly independent of the latter?


Others have already said it, but it needs to be said again: Good god, stop treating LLMs like oracles.

LLMs are not encyclopedias.

Give an LLM the context you want to explore, and it will do a fantastic job of telling you all about it. Give an LLM access to web search, and it will find things for you and tell you what you want to know. Ask it "what's happening in my town this week?", and it will answer that with the tools it is given. Not out of its oracle mind, but out of web search + natural language processing.

Stop expecting LLMs to -know- things. Treating LLMs like all-knowing oracles is exactly the thing that's setting apart those who are finding huge productivity gains with them from those who can't get anything productive out of them.


I am getting huge productivity gains from using models, and I mostly use them as "oracles" (though I am extremely careful with respect to how I have to handle hallucination, of course): I'd even say their true power--just like a human--comes from having an ungodly amount of knowledge, not merely intelligence. If I just wanted something intelligent, I already had humans!... but merely intelligent humans, even when given months of time to screw around doing Google searches, fail to make the insights that someone--whether they are a human or a model--that actually knows stuff can throw around like it is nothing. I am actually able to use ChatGPT 4.5 as not just an employee, not even just as a coworker, but at times as a mentor or senior advisor: I can tell it what I am trying to do, and it helps me by applying advanced mathematical insights or suggesting things I could use. Using an LLM as a glorified Google-it-for-me monkey seems like such a waste of potential.


> I am actually able to use ChatGPT 4.5 as not just an employee, not even just as a coworker, but at times as a mentor or senior advisor: I can tell it what I am trying to do, and it helps me by applying advanced mathematical insights or suggesting things I could use.

You can still do that sort of thing, but just have it perform searches whenever it has to deal with a matter of fact. Just because it's trained for tool use and equipped with search tools doesn't mean you have to change the kinds of things you ask it.


If you strip all the facts from a mathematician you get me... I don't need another me: I already used Google, and I already failed to find what I need. What I actually need is someone who can realize that my problem is a restatement of an existing known problem, just using words and terms or a occluded structure that don't look anything like how it was originally formulated. You very often simply can't figure that out using Google, no matter how long you sit in a tight loop trying related Google searches; but, it is the kind of thing that an LLM (or a human) excels at (as you can consider "restatement" a form of "translation" between languages), if and only if they have already seen that kind of problem. The same thing comes up with novel application of obscure technology, complex economics, or even interpretation of human history... there is a reason why people who study Classics "waste" a ton of time reading old stories rather than merely knowing the library is around the corner. What makes these AIs so amazing is thinking of them as entirely replacing Google with something closer to a god, not merely trying to wrap it with a mechanical employee whose time is ostensibly less valuable than mine.


> What makes these AIs so amazing is thinking of them as entirely replacing Google with something closer to a god

I guess that way of thinking may foster amazement, but it doesn't seem very grounded in how these things work or their current capabilities. Seems a bit manic tbf.

And again, enabling web search in your chats doesn't prevent these models from doing anything "integrative reasoning", so-to-speak, that they can purportedly do. It just helps ensure that relevant facts are in context for the model.


Yeah, but like, "relevant facts" is a big part of reasoning? I don't get anywhere near as good results on anything I want from the dumber models, and I almost never get good results from Google searches as, as I said, I already did that. To put it into engineering, people come to me for security stuff, and I've spent my life working in that field, so I just know things that I'd never be able to find with a Google search if I didn't already know the thing I am looking for (and often I can't recover a reference even if I do remember).

I frankly feel people don't spend enough time with ChatGPT 4.5... like, if you haven't yet found use cases that it can do that the other models can't even come close to, are you really using AI effectively?


The problem is that even when you give them context, they just hallucinate at another level. I have tried that example of asking about events in my area, they are absolutely awful at it.


I love how with this cutting edge tech people still dress up and pretend to be experts. Pleasure to meet you, pocketarc - Senior AI Gamechanger, 2024-2025 (Current)


It's fine to expect it to not know things, but the complaint is that it makes zero indication that it's just making up nonsense, which is the biggest issue with LLMs. They do the same thing when creating code.


Exactly this. And that is why I like this question because the amount of correct details and the amount of nonsense give a good idea about the quality of the model.


LLMs should at least -know- the semantics about the text it analyzed as opposed to the syntax.


To be coherent and useful in general-purpose scenarios, LLM absolutely has to be large enough and know a lot, even if you aren't using is as an oracle.


I don’t think they trained it for fact retrieval.

Would probably do a lot better if you give it tool access for search and web browsing.


What is the point of an offline reasoning model that also doesn't know anything and makes up facts? Why would anyone prefer this to a frontier model?


Data processing? Reasoning on supplied data?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: