"Hash functions are designed to be fast and to yield few hash collisions in expected input domains. In hash tables and data processing, collisions inhibit the distinguishing of data, making records more costly to find.

A fundamental property of all hash functions is that if two hashes (according to the same function) are different, then the two inputs are different in some way. This property is a consequence of hash functions being deterministic. On the other hand, a function is not injective, i.e., the equality of two hash values ideally strongly suggests, but does not guarantee, the equality of the two inputs. If a hash value is calculated for a piece of data, and then one bit of that data is changed, a hash function with a strong mixing property usually produces a completely different hash value.

Typical hash functions have an infinite domain, such as byte strings of arbitrary length, and a finite range, such as bit sequences of some fixed length. In certain cases, hash functions can be designed with one-to-one mapping between identically sized domain and range. Hash functions that are one-to-one are also called permutations. Reversibility is achieved by using a series of reversible "mixing" operations on the function input."2

""Because of the variety of applications for hash functions (details below), they are often tailored to the application. For example, cryptographic hash functions assume the existence of an adversary who can deliberately try to find inputs with the same hash value. A well designed cryptographic hash function is a "one-way" operation: there is no practical way to calculate a particular data input that will result in a desired hash value, so it is also very difficult to forge. Functions intended for cryptographic hashing, such as MD5, are commonly used as stock hash functions.

Functions for error detection and correction focus on distinguishing cases in which data has been disturbed by random processes. When hash functions are used for checksums, the relatively small hash value can be used to verify that a data file of any size has not been altered.""3

""Hash tables, a major application for hash functions, enable fast lookup of a data record given its key. (Note: Keys are not usually secret as in cryptography, but both are used to "unlock" or access information.) For example, keys in an English dictionary would be English words, and their associated records would contain definitions. In this case, the hash function must map alphabetic strings to indexes for the hash table's internal array.

The ideal for a hash table's hash function is to map each key to a unique index (see perfect hashing), because this guarantees access to each data record in the first probe into the table. However, this is often impossible or impractical.

Hash functions that are truly random with uniform output (including most cryptographic hash functions) are good in that, on average, only one or two probes will be needed (depending on the load factor). Perhaps as important is that excessive collision rates with random hash functions are highly improbable-if not computationally infeasible for an adversary. However, a small, predictable number of collisions is virtually inevitable (see birthday paradox).

In many cases, a heuristic hash function can yield many fewer collisions than a random hash funct
...Show more