Hacker Newsnew | past | comments | ask | show | jobs | submit | ty00001's commentslogin

Speechmatics - it is on the expensive side, but provides access to a bunch of languages and the accuracy is phenomenal on all of them - even with multi-speakers.


Really great deep dive into a subtle yet impactful problem in voice AI. Turn detection is one of those things users only notice when goes wrong, and this shows a brilliant job showing how traditional VAD-based approaches fall short.

Loved the explanation of using instruction-tuned SLMs for <|im_end|> probability - elegant, efficient, and practical. The code examples very handy too!

This is one of those posts I’ll be coming back to when thinking about latency-sensitive voice interfaces with my own projects.


There's three fantastic niche players in the speech-to-text market right now that you should check out: - Deepgram (cheap and dirty, but accuracy quite poor) - Speechmatics (a bit more pricey, but fantastic accuracy) - Assembly AI (just announced Series C funding of $50m)


Problem with this is Deepgram's accuracy (but agree their speed/latency is excellent). We used to use them too, but eventually we got so frustrated with poor accuracy we switched to Speechmatics - would definitely recommend checking them out.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: