Hashing

Hashing Summary

What is the need for hashing

Consider a scenario where you want to design a system for storing records of employees of your organization keyed using their employee IDs. We need this system primarily to do the following 3 tasks:
- Insert a employee ID along with corresponding employee data.
- Search based on employee ID and fetch that employee’s details.
- Delete a employee data based on the employee ID entered.
What are the data structures that we can think of for performing the above functionalities for our system efficiently? Well, let’s analyze about implementing this system using the known data structures:
- Arrays / Linked Lists of employee ID and data:
  - If we implement the search functionality in linear fashion, it can be costly to us when we have huge number of records to be searched upon.
  - If we make sure that the arrays are maintained in sorted manner based on the employee ID, then using Binary Search, we can search for employee ID O(Logn) time. But this will have an impact on insert and delete functionalities because we need to maintain sorted order of arrays upon every insert and deleting of records.
  - Hence this data structure is not feasible to use.
- Balanced binary search tree (BBST) taking employee ID as keys:
  - A BBST is a binary tree in which the left and right subtrees of every node always differs in height by no more than 1.
  - Here, due to the nature of BBST data structure, we can guarantee that the search, insert and delete functionalities can take place in O(Logn) time.
  - But, is there any more efficient way? Let’s look for other options.
- Direct Access Table (DAT):
  - This data structure uses the O(1) retrieval time complexity property of Arrays to its advantage. Lets see how below.
  - In DAT, we make a very large array and use employee ID as index for the array.
  - We treat a value in array as null if employee ID is not present, else the value stores pointer to data mapping to the employee ID.
  - Here, we can do all the operations - Search, Insert and Delete in O(1) time which is the best possible we can achieve.
  - But, this has many limitations.
    - The obvious major problem is we need to have a large space dedicated for this data structure. The space complexity would be huge.
    - Another problem is if we have employee ID purely as an integer of n digits, then a programming language cannot support to store its value as integer as the value increases (as the number of records increase after a particular limit).

Applications

Hashing has found applications in lots of fields due to its property of irreversibility and constant time access. Following are the applications of hashing:

Password Security and Verification
- Due to property of “one-way” or “irreversible” hash function, the world has seen a rise in cryptographic hash functions which provides the hash value in encrypted format.
- Security: Whenever we are using any website for the first time which requires our authentication using “Sign Up” wherein we enter our credentials to access the accounts that belongs to us, we need to be assured that our account does not fall into wrong hands. Hence, the password entered is stored in the database in the form of a hash. This is done to avoid our crucial data falling into wrong hands.
- Verification: When we try to login to the website from next time onwards, the password hash is computed and sent to the server for verifying whether that password belongs to us. The hash value is sent so that whenever a data is sent from client to server there is no case of sniffing.
Tokenization: The same above concept explained in “Password Security and Verification” can be extended to tokenization for storing the authorization tokens of users already logged into the systems.
Data Structures: Hashing is commonly used in defining and developing data structures of programming languages. For example: HashMap in Java, unordered_map in C++ etc.
Compiler Development The compilers corresponding to programming languages save all the keywords (such as if, else, for, while, return etc) that are used by that language in the form of hash table so that the compiler can easily identify what keywords and identified are needed to be compiled during compilation process.
Message Digests: This is again an application of cryptographic hashes.
- Consider a scenario where you use cloud services to store your files and data. Now you need to be assured that the third party cloud service providers dont tamper with your files. We do so by computing hash of that file/data using a cryptographic hashing algorithms. We store these computed hashes in our local machines for reference.
- Next time whenever we download the files from cloud storages, we can directly compute the hash of the files. We match it with the locally stored hash that was previously computed.
- If the file has been tampered by the service providers, the hash value of the file would definitely change. If it hasnt been tampered then the hash values of the downloaded file and the locally saved value would be exactly same.
- Most common cryptographic algorithm is SHA256. These are also called as digest algorithms. The files in the above example is considered to be a message.
Others:
- Hashing is also used in various common algorithms like Rabin Karp algorithm, counting frequency of characters in a word, creating git commit codes by git tool, caching etc.
- Apart from above mentioned applications, hashing is also commonly used in Blockchain, Machine learning for feature hashing, storing objects or files in AWS S3 Buckets, Indexing on Database for faster retrieval and many others!

FAQs

Are Hashing algorithms secure?
- We have a separate branch of hashing algorithms that provide security and they are called as Secure Hashing Algorithms (SHAs). These are also called as cryptographic hashing algorithms which are widely used in computer and cloud systems for providing security to any resource.
- The most secure hashing algorithm available is SHA256.
Who invented Hashing?
- Hashing was first invented by Hans Peter Luhn in the 1940s.
- He demonstrated his work on hashing at a six-day international conference devoted to scientific information which was scheduled in November 1958.
How does Hashing work?
- Hashing mainly works by mapping data of any arbitrary size to fixed-size value with the help of a function called “hash function” and storing it in a data structure called “hash table” at the value returned by hash functions. The values returned by this function are called hash codes, digests, hash values or hashes.
- Read more
How is hashing implemented in Java?
- Java has provided inbuilt classes for the purpose of implementing hashing and they are as follows:
  - HashTable<Object, Object> : This is a synchronised (very useful in multi-threaded environments) implementation of hashing.
    - Implementation:
      - Code:
        import java.util.*; class InterviewBit { public static void main(String args[]) { // Defining HashTable to store // String values mapped to integer keys Hashtable<Integer, String> hashTable = new Hashtable<Integer, String>(); // Fill the values in HashTable hashTable.put(1, "InterviewBit"); hashTable.put(2, "Example"); hashTable.put(5, "for"); hashTable.put(3, "HashTable"); // Display the hashtable System.out.println(hashTable); } }
      - Output: {1=InterviewBit, 2=Example, 5=for, 3=HashTable}
  - HashMap<Object, Object> : This is a non - synchronised (doesnt work in multi-threaded environments) and faster implementation of hashing. Here, the order of insertion of keys are not maintained.
    - Implementation:
      - Code:
        import java.util.*; class InterviewBit { public static void main(String args[]) { // Defining HashMap to store // String values mapped to integer keys HashMap<Integer, String> hashMap = new HashMap<Integer, String>(); // Fill the values in HashMap hashMap.put(1, "InterviewBit"); hashMap.put(2, "Example"); hashMap.put(5, "for"); hashMap.put(3, "HashMap"); // Display the hashMap System.out.println(hashMap); } }
      - Output: {1=InterviewBit, 5=for, 3=HashMap, 2=Example}
  - LinkedHashMap<Object, Object>: Works similar to HashMap but here the order of insertion of keys are maintained.
  - ConcurrentHashMap<Object, Object>: This class works similar to HashTable class. This is synchronized, meaning it will work great with multi-threaded environments and this class is faster than HashTable as it supports efficient means of handling multiple locks on the resource.
  - HashSet<Object>: This works similar to HashMap class but here only the keys are maintained. Doesn’t support duplicate objects. Here, Order of insertion of keys is not maintained.
  - LinkedHashSet<Object> This is similar to LinkedHashMap. The class just supports the keys. Order of insertion of keys are maintained.
Why the results of hashing cannot be reverted back to original input?
- Hashing was developed by keeping in mind of the security of any systems. Hence, for a good hash function, it is always important for the function to be “one-way” so that the hackers do not get easy access to the critical data.
When is it not advisable to use Hashing/ Hash Tables?
- In general, hashing has excellent time complexity for Insert/ Search / Delete operations.
- For smaller application, it is advisable to use arrays because hash table consumes more memory space.
- Hash tables do not support some operations like iterating over the entries where the keys are within a defined range, finding the entry with the largest/smallest key and so on. In such cases, it is better to go with arrays.
Where is Hashing used?
- Hashing is most widely used across many applications such as password security, password verification, tokenization, data structures and compilers of programming languages, blockchain, machine learning feature hashing and many more!

Hashing Problems

Hash search

Problem	Score	Companies	Time
Colorful Number	150	Epic systems	44:55
Largest Continuous Sequence Zero Sum	200		68:45
Longest Subarray Length	200	DE Shaw	57:26
First Repeating element	200	DE Shaw	21:50
2 Sum	300	Amazon	49:18
4 Sum	325	Amazon	72:27
Valid Sudoku	325	Amazon	50:42
Diffk II	375		29:06

Key formation

Problem	Score	Companies	Time
Pairs With Given Xor	200	Flipkart	27:20
Anagrams	350	Amazon Goldman Sachs Deloitte	46:36
Equal	350		71:23
Copy List	450	Amazon	56:18

Maths and hashing

Problem	Score	Companies	Time
Check Palindrome!	200	deshaw	18:56
Fraction	450	Amazon	81:53
Points on the Straight Line	450	Amazon InMobi	78:55

Incremental hash

Problem	Score	Companies	Time
An Increment Problem	200	Bloomberg L.P Evie.ai Groupon Shopee	50:48
Subarray with given XOR	200	dunzo	56:12
Two out of Three	200	Booking.com	33:54
Substring Concatenation	1000		71:27

Hashing two pointer

Problem	Score	Companies	Time
Subarray with B odd numbers	200	dunzo	52:47
Window String	350	Directi Flipkart Ola Zenefits	82:16
Longest Substring Without Repeat	350	Amazon	51:12

Hashing

Level 1

Time Complexity ▼

How to Calculate Running Time?

Asymptotic notations

How to Calculate Time Complexity?

Time Complexity Examples

Relevance of time complexity

Space Complexity

Level 2

Arrays ▼

Introduction to pointers in C/C++

Arrays in programming - fundamentals

Pointers and arrays

Pointers and 2-D arrays

Array Implementation Details

Sorting Algorithms

Insertion sort algorithm

Merge sort algorithm

QuickSort Algorithm

Sort Implementation Details

Selection Sort

Bubble Sort

Math ▼

Math Introduction

Factorization

Base number system

Unary number system

Binary number system

Base conversions for base N

Level 3

Binary Search ▼

Binary Search Implementations and common errors

Binary Search Algorithm

Applications of Binary Search

Beyond Sorted Array Binary Search

Advantages and Disadvantages of Binary Search

Strings ▼

For C / C++ users,

For C / C++ users ( contd ),

String Implementation Details

Bit Manipulation ▼

What is Binary Number System?

Understanding Data Types

Bitwise Operators

Bitwise Operators Examples

Bitwise and Logical Operators

Two Pointers ▼

TWO POINTERS

Level 4

Linked Lists ▼

Introduction to linked list

Arrays vs Linked Lists

Linked List Implementation

Doubly linked list

Doubly Linked List Implementation

Stacks And Queues ▼

Introduction to Stack

Array implementation of Stack

Linked List implementation of stack

Stack Implementation Details

Introduction to Queues

Array implementation of Queues

Linked List implementation of Queue

Queue Implementation Details

Level 5

Backtracking ▼

Recursion basics - using factorial

Complexity analysis of recursive programs

Why recursion is not always good

Time Complexity analysis of recursion

Space complexity analysis of recursion

Maze Traversal Algorithm Using Backtracking

Graph Coloring Algorithm using Backtracking

Hashing ▼

Introduction to hashing

Key terms in Hashing

Hashing Techniques

Hashing Implementation Details

Hashing Summary