Yes, such a thing exists for text files: There are techniques used for near-duplicate detection (MinHash), and then there are latent topic models (e.g. using lda, lsi, autoencoders) which map documents onto a lower-dimensional ("semantic") space which is supposed to give a similar representation to semantically similar documents.