Hashing : probing methods

Let’s observe different methods to handle collisions in hash tables

Linear probing :

When a collision occurs, the colliding element will be placed in the next available block.

Let’s imagine a hash function : h(x) = x mod 6 ;

Let’s take an array arr, with n as null

arr = [ n , n , n , n , n , n ]

We store the value 10 , at the index 10 mod 6 (which equals to 4).

arr = [ n , n, n, n, 10 ,n]

We then store the value 16.

Because 16 mod 6 returns 4 , there is a collision between 10, and 16.

We’ll then simply choose the next available block.

[ n ,n , n , n, 10, 16 ];

if the block at index+1 wasn’t available, we would try to insert it in the block index +2 , and so on. When reaching the end of an array, the next available space becomes the start of the array. If we had to insert 26, the array would look that way :

[ 26 , n , n , n , 10, 16 ];

To retrieve a desired element. First , compute its hash value . Then, look for the corresponding space. If the value stored doesn’t match, check the next space. Keep going until you either checked all the space blocks(meaning, if the index is equal to the array length) , you find an empty block, or you find the aimed element.

To delete an element , proceed the same way.

But, note that, it is important not to simply delete the aimed element, leaving the block as “empty” . Indeed, let’s imagine that we simply deleted 16. We would obtain this array :

[ 26 , n, n, n, 10 , n];

Now, we decided to retrieve 26 , the loop would break as we would encounter an empty element on the way. So, we need to insert a marker, alerting that there was a value before on this block , meaning there could be values present on further blocks.

Separate chaining :

Use blocks as linked lists. That way, there will not be any collisions , as values returning similar hashes can be stored next to each other.

Quadratic probing :

Quadratic probing provides an efficient solution to the “clustering effect” caused by the use of linear probing.

Indeed, linear probing can cause elements starting to form groups of consecutive sells, which can really slow down the execution of the algorithm , wheather we think about retrieval, deletion , or insertion.

arr = [n,n,n,n,n,n,n,n,n,n,n,n];

Here is a function that could be for a linear probing :

(h(x) + i ) mod 6

For a quadratic probing, we would proceed that way :

(h(x) + i ^2) mod n

Let’s take the hash function we used previously :

h(x) = x mod 5;

insert 7, it would give you 2.

arr = [ n , n, 7, n, 10 ,n]

Now, insert 12. It would return 2 as well.

So, perform the quadratic probing :

i = 1

2 + i^2 ;

The value will be stored as index 2 + 1 => 3.

Next, lets imagine we wanted to insert 22 :

i = 2

The value would be stored at the index 2 + 2 ^2=> 4.

If you wanted to insert 32 :

i =3

The value would be stored at the index 2 + 3 ^2 => 11.

Here would be the array obtained :

arr = [n,n, 12 ,n, 22,n,n,n,n,n,n, 32 , n];

We can clearly see that , using this method, elements are spread over the available space.

However , we can arrive to a point called : secondary clustering.

Indeed, we could arrive to a cycle, where we only check unempty spaces over and over.

This is one of the drawbacks of quadracting probing. However, there are ways to handle this drawback, we’ll address further.

Double hashing :

Instead of having a single hashing function, we can combine different ones

arr = [n,n,n,n,n,n,n,n,n,n,n,n];

Let’s create a hash function h1 , and a hash function h2 , for a value of h(x) :

h(x) = hash1(x) + i * hash(x)

hash1(x) = k mod 5

hash2(x) = (3+x) mod 5

Insert 7;

hash1(7) = 2 ;

hash2(7) = 5

i=0

h(x) = 2 + i*5 => 2

arr = [n,n,7,n,n,n,n,n ,n,n,n,n];

Let’s imagine now that we had to insert 12.

hash1(12) = 2 ;

hash2(12) = 5

i=0

h(x) = 2 + i*5 => 2

Because of a collision, we increase the index by 1.

i=1

h(x) = 2 + i*5 => 7

We observe that, until there is no collision, the second hashing functions is not used to compute the hash value.