Hashing

Introduction to hashing

What is hashing? How does it work?

Hashing is the process of converting a given key into another smaller value for O(1) retrieval time.

This is done by taking the help of some function or algorithm which is called as hash function to map data to some encrypted or simplified representative value which is termed as “hash code” or “hash”. This hash is then used as an index to narrow down search criteria to get data quickly.
Hashing can be considered as a significant improvement over DAT to reduce the space complexity.
In our example of employee system that we have seen in the Introduction part, we can simply pass the employee ID to our hash function, get the hash code and use it as key to query over records.
Let us see one more example to understand how hashing works.

Purely as an example to help us grasp the concept, let us suppose that we want to map a list of string keys to string values (for example, map a list of countries to their capital cities). So let’s say we want to store the data in Table 1 in the map.

Key             |    Value
----------------|-------------
Cuba            |    Havana
England         |    London
France          |    Paris
Spain           |    Madrid
Switzerland     |    Berne

And let us suppose that our hash function is to simply take the length of the string.

For simplicity, we will have two arrays: one for our keys and one for the values. So to put an item in the hash table, we compute its hash code (in this case, simply count the number of characters), then put the key and value in the arrays at the corresponding index.
For example, Cuba has a hash code (length) of 4. So we store Cuba in the 4th position in the keys array, and Havana in the 4th index of the values array etc. And we end up with the following:

Position             |   Keys array     |  Values array
(hash = key length)  |                  |
---------------------|------------------|---------------
   1                 |                  |
   2                 |                  |
   3                 |                  |
   4                 |    Cuba          |    Havana
   5                 |    Spain         |    Madrid
   6                 |    France        |    Paris
   7                 |    England       |    London
   8                 |                  | 
   9                 |                  |
   10                |                  |
   11                |  Switzerland     |    Berne

Now, in this specific example things work quite well. Our array needs to be big enough to accommodate the longest string, but in this case that’s only 11 slots. And we do waste a bit of space because, for example, there’s no 1-letter keys in our data, nor keys between 8 and 10 letters. But in this case, the waste isn’t so bad either. And taking the length of a string is nice and fast, so so is the process of finding the value associated with a given key (certainly faster than doing up to five string comparisons)1.

We can also easily see that this method would not work for storing arbitrary strings. If one of our string keys was a thousand characters in length but the rest were small, we would waste the majority of the space in the arrays. More seriously, this model can’t deal with collisions: that is, what to do when there is more than one key with the same hash code (in this case, more than one string of a given length). For example, if our keys were random words of English, taking the string length would be fairly useless. Granted, the word “psuedoantidisestablishmentarianistically” would probably get its own place in the array. But on the other hand, we’d be left with thousands of, say, 6-letter words all competing for the same slot in the array.

In this topic, we explore hashing, a technique very widely used in interview questions. Hashing is designed to solve the problem of needing to efficiently find or store an item in a collection.
For example, if we have a list of 10,000 words of English and we want to check if a given word is in the list, it would be inefficient to successively compare the word with all 10,000 items until we find a match. Hashing is a technique to make things more efficient by effectively narrowing down the search at the outset.

Hashing Problems

Hash search

Problem	Score	Companies	Time
Colorful Number	150	Epic systems	44:55
Largest Continuous Sequence Zero Sum	200		68:45
Longest Subarray Length	200	DE Shaw	57:26
First Repeating element	200	DE Shaw	21:50
2 Sum	300	Amazon	49:18
4 Sum	325	Amazon	72:27
Valid Sudoku	325	Amazon	50:42
Diffk II	375		29:06

Key formation

Problem	Score	Companies	Time
Pairs With Given Xor	200	Flipkart	27:21
Anagrams	350	Amazon Goldman Sachs Deloitte	46:36
Equal	350		71:23
Copy List	450	Amazon	56:18

Maths and hashing

Problem	Score	Companies	Time
Check Palindrome!	200	deshaw	18:56
Fraction	450	Amazon	81:53
Points on the Straight Line	450	Amazon InMobi	78:55

Incremental hash

Problem	Score	Companies	Time
An Increment Problem	200	Bloomberg L.P Evie.ai Groupon Shopee	50:48
Subarray with given XOR	200	dunzo	56:12
Two out of Three	200	Booking.com	33:54
Substring Concatenation	1000		71:27

Hashing two pointer

Problem	Score	Companies	Time
Subarray with B odd numbers	200	dunzo	52:47
Window String	350	Directi Flipkart Ola Zenefits	82:16
Longest Substring Without Repeat	350	Amazon	51:12

Level 1

Time Complexity ▼

How to Calculate Running Time?

Asymptotic notations

How to Calculate Time Complexity?

Time Complexity Examples

Relevance of time complexity

Space Complexity

Level 2

Arrays ▼

Introduction to pointers in C/C++

Arrays in programming - fundamentals

Pointers and arrays

Pointers and 2-D arrays

Array Implementation Details

Sorting Algorithms

Insertion sort algorithm

Merge sort algorithm

QuickSort Algorithm

Sort Implementation Details

Selection Sort

Bubble Sort

Math ▼

Math Introduction

Factorization

Base number system

Unary number system

Binary number system

Base conversions for base N

Level 3

Binary Search ▼

Binary Search Implementations and common errors

Binary Search Algorithm

Applications of Binary Search

Beyond Sorted Array Binary Search

Advantages and Disadvantages of Binary Search

Strings ▼

For C / C++ users,

For C / C++ users ( contd ),

String Implementation Details

Bit Manipulation ▼

What is Binary Number System?

Understanding Data Types

Bitwise Operators

Bitwise Operators Examples

Bitwise and Logical Operators

Two Pointers ▼

TWO POINTERS

Level 4

Linked Lists ▼

Introduction to linked list

Arrays vs Linked Lists

Linked List Implementation

Doubly linked list

Doubly Linked List Implementation

Stacks And Queues ▼

Introduction to Stack

Array implementation of Stack

Linked List implementation of stack

Stack Implementation Details

Introduction to Queues

Array implementation of Queues

Linked List implementation of Queue

Queue Implementation Details

Level 5

Backtracking ▼

Recursion basics - using factorial

Complexity analysis of recursive programs

Why recursion is not always good

Time Complexity analysis of recursion

Space complexity analysis of recursion

Maze Traversal Algorithm Using Backtracking

Graph Coloring Algorithm using Backtracking

Hashing ▼

Introduction to hashing

Key terms in Hashing

Hashing Techniques

Hashing Implementation Details

Hashing Summary