Research:
Information retrieval:
11.06.2013 : Vector Space Model

Vector Space Model:

Task:Find similar documents.

Idea:Define a suitable vector for every document. The similarity between two documents is given by the distance of its vectors.

Solution:
We are looking for similar documents but when is a document similar to an other one?
A document is equal to another one, if their word sequence is identical. On the contrary, a document is different if the word sequence differs.
The word sequence is not a robust feature. Alternatively, we can argue that the vocabulary of similar documents is similar. It is understandable that some words are more distinguishable than other words. The key task will be to determine the most distinguishable words. This can be done using a TF-IDF (Term Frequency-Inverse Document Frequency) weighting.

Currently in writing!