Topic Modelling is a cool sunsupervised way to identift documents in a corpus.

One simple approach is to create a tf-idf matrix and cluster.

Resources Text Analysis with Topic Model in Social Sciences