The common search or Boolean query that computer users do everyday is a submission of a term to search engine which is programmed with a Boolean algorithm which finds documents with the term we included in the search and it is supported by an index containing all terms in the database. The simple form of Boolean query,
which is efficiently implemented over large databases, suffers several limitations: The number of retrieved documents is typically prohibitively large. A substantial part of the retrieved documents is irrelevant to the user's information need.
A broadly used alternative to the Boolean query is the similarity query, which is typically based on the vector-space model. Under this setting, documents are viewed as (algebraic) vectors over terms. A query, q, may consist of many terms, and even comprise a complete document. It too is viewed as a body of text, rather than merely as a search-terms combination and is represented as a vector as well. The retrieval task reduces to searching the database for document-vectors that are most similar to the query-vector. ...