My Notes

U: Documents by topic columns are orthonormal

magnitude is 1; orthogoanl to each other

S: Singular Value is diagonal Matrix which shows strength of topic zero values on diagonal

V: relates topic to words

alt text

U is orthogonal columns V is orthogonal rows

SVD is an exact decomposition, since the matrices it creates are big enough to fully cover the original matrix. SVD is extremely widely used in linear algebra, and specifically in data science, including:

semantic analysis
collaborative filtering/recommendations (winning entry for Netflix Prize)
data compression
principal component analysis

Latent Semantic Analysis (LSA) uses SVD.

calculating svd

U, s, Vh = linalg.svd(document_matrix, full_matrices=False)

print(U.shape, s.shape, Vh.shape)
(2034, 2034) (2034,) (2034, 26576)

verify matrix
np.diag converts a vector to a diagonal matirx

reconstructed_matrix = U @ np.diag(s) @ Vh np.allclose(reconstructed_matrix, document_matrix)



We see that the Singular Values are reducing drastically.
Postion 1 is 400 but position 25 is like 10.
Singular Values shows strength of topic.

![alt text](./images/matrix_singular_values.png "Matrix Singular Values")

← /nlp/topic_modelling/01_intro.html Non-negative Matrix Factorization (NMF) →