# What is a Hash Function?

A hash function is a method of checking computer errors and organizing data. A large amount of data is manipulated with a mathematical algorithm until a small number remains. This number is used as part of the catalog that allows a computer to find that specific information later. A good hash function should provide a result small enough that it is easy to use, but provide a unique result for each dataset. A hash function also provides minimal error checking, as corrupt data and a good piece of data should produce different results when hashed.

A hash function is a method of checking computer errors and organizing data.

In a computer database, it is usually easier to save locations with numbers rather than letters. Digits have a far greater number of methods of organization and sorting than letters. As a result, numbers are often assigned to locations that contain variable information in a computer’s database. These numbers can be arbitrary or representative of the information.

Arbitrary numbers are simply assigned based on the computer’s memory location or the order in which the data was saved. Saving information this way is common in smaller databases or in places where the data doesn’t change very often. When used in other areas, re-indexing the database starts to take longer and longer until it is no longer efficient.

It is in the representative information that the hash function comes in. Information, regardless of what it contains, is translated into numbers. These numbers are entered into a mathematical construct that produces a small number, typically an integer. If the hash function is working correctly, then each location in that part of the database will have its own unique result. If two or more locations have the same result, programs may return wrong information based on the duplicate hash.

It is possible to use a hash function for other things as well. Large amounts of highly repetitive data can be broken down into smaller values. This is especially interesting when looking for repeating sequences in large datasets. For example, deoxyribonucleic acid (DNA) is made up of a very small number of different components. By breaking down these components using hash values, the places where two DNA sequences are the same and different become very clear, simply by comparing two small columns of numbers.

The last area where hash functions are useful is in error checking. When the information is hashed initially, the value is recorded as part of the location index. If this information is needed later, it will be retrieved along with this value. If the program rehashes the information and the result is different, then corruption has occurred at some point. This corruption usually occurs with the data, as a hash corruption would have prevented the data from being recovered.