Finding cosine similarity between two documents with detailed explanation
Answers
I am going through the Manning book for Information retrieval. Currently I am at the part about cosine similarity. One thing is not clear for me.
Let's say that I have the tf idf vectors for the query and a document. I want to compute the cosine similarity between both vectors.
When I compute the magnitude for the document vector, do I sum the squares of all the terms in the vector or just the terms in the query?
Here is an example : we have user query "cat food beef" .
Lets say its vector is (0,1,0,1,1).( assume there are only 5 directions in the vector one for each unique word in the query and the document)
We have a document "Beef is delicious"
Its vector is (1,1,1,0,0). We want to find the cosine similarity between the query and the document vectors.