Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, such a thing exists for text files: There are techniques used for near-duplicate detection (MinHash), and then there are latent topic models (e.g. using lda, lsi, autoencoders) which map documents onto a lower-dimensional ("semantic") space which is supposed to give a similar representation to semantically similar documents.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: