What are hash tables good for
Hash tables are typically used to store large numbers of objects.
Hashing table are good for retrieving data, since its based on the dictionary principle, we access the data directly using an index, which is the result of a hash function.
The hash function preforms some algorithms to process the unique value and returns, what should be, a unique index.
Retrieving data from Hash Tables should have, optimally, a static time for each retrieve operation, this is since you retrieve directly without search.
Hash Table size
The size of the hash table is important. it should be big enough to hold future data, and it should be the size of a primary number.
Depending on the colission strategy you choose (Open Adressing or Closed Adrressing), you should choose an array size that is the Closest Primary Number , this is because the array size will be used in the hash function, and a primary number will be most beneficial for this.
Hashing function
The hash function takes a key value, a value by which we search the item, could be a unique string, for example: last name + first name, and calculates and returns an index.
The value should be the value of a predetermaned attribute that represents the key, and should exists on every object that is inserted into the table.
Lets say for example the table is a table of costumers, and a costumer looks like the following:
costumerObject = {
lastname:'katz',
firstname:'tomas',
payments:12,
eachpayment:20,
currency:'$'
}
and we have an object for a costumer request for payment information that looks like this:
paymentRequestObject = {
fullname:"tomas katz",
requesttime:1349294394234524
}
we might pass the fullname property as the hash function input because we know that costumerObject.firstname+' '+ costumerObject.lastname, will hash to the same key.
So to store the object we will store it to the index value of hashFunction(costumerObject.firstname+' '+ costumerObject.lastname)
And to retrieve it we will access the array trough the index value of hashFunction(paymentRequestObject.fullname).
Hashing Algorithms
A simple hashing algorithm might do the following:
1) calculate the ASCII sum of the input string
2) then return the Modulu of that sum with the Primary array size.
And this is only is the key is a string.
Handling Collisions
Since we a have a limited number of items in an array, but unlimited possible value, collisions of keys will happen.
Colissions of keys happend when two different values hash to the same key value, and given a big enough number of items these collision will happen.
Handeling collisions fall into two strategies, Open Adressing & Closed Adressing .
Open Adressing
Is a strategy where, collision are considered in the size of the array, the aproximate size of the array is calculated and the size of the is trippled.
That way, if a collision happens, the next available open slot is probed for, and since the array size is triple that of our aproximation, that means that 2 open slots might be open next to the adress where the collision happened.
Since the place where the collision happend is already occupied, we start a linear probe and check the next index, if its also occupied we check the next, and so on until an open space occurs where we can insert the value.
This way retrieving gives us the best possible place to start a probe, in best case scenarion a static time of imidate retrival, since no collisions happned before, at worst case a full linear probe of the whole array, in case multiple collisions already happened and all open spaces were occupied and the last place to store the value was the last place open in the array.
[item,if colision,if colision,if colision,item,colided,if colision,item,colided,colided]
Closed addresing
Closed adressing is a collision strategy where, when a collision occurres we create an array inside the item at the index where the collision occurred, pushing items that collided with the already occupied index into the array at that index.
Thus we dont need to enlarge the array size to hold more items then it should.
And in case we search for an item that colided, we reach the index than start a search only on the collision items, no linear probing is needed, we know that our item is in this index, which is an array, and all we need to do is probe that array to find our item, and not the whole array.
[item,item [collision item,collision item,collision item],item]