Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hallucinations are more or less a solved problem for me ever since I made a simple harness to have Codex/Claude check its work by using static typechecking.




But there aren’t very many domains where this type of verification is even possible.

Then you apply LLMs in domains where things can be checked

Indeed I expect to see a huge push into formally verified software just because sound mathematical proofs provide an excellent verifier to put into a LLM hardness. Just see how Aristotle has been successful at math, and it could be applied to coding too

Maybe Lean will become the new Python

https://harmonic.fun/news#blog-post-verina-bench-sota


  "LLMs reliably fail at abstraction."
  "This limitation will go away soon."
  "Hallucinations haven't."
  "I found a workaround for that."
  "That doesn't work for most things."
  "Then don't use LLMs for most things."

Um, yes? Except ‘most things’ are not much at all by volume.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: