ConcurrentHashMap Internal Working
ConcurrentHashMap utilizes the same principles of HashMap, but is designed primarily for a multi-threaded application and hence it does not require explicit synchronization. The only thread safe collection objects were Hashtable and synchronized Map prior to JDK 5.
Before learning How ConcurrentHashMap works in Java , we need to look at why concurrentHashMap is added to the Java SDK.
Hashtable provides concurrent access to the Map.Entries objects by locking the entire map to perform any sort of operation (update,delete,read,create). Suppose we have a web application , the overhead created by Hashtable (locking the entire map) can be ignored under normal load. But under heavy load , the overhead of locking the entire map may prove fatal and may lead to delay response time and overtaxing of the server.
This is where ConcurrentHashMap comes to rescue. According to ConcurrentHashMap Oracle docs,
ConcurrentHashMap class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details. So the main purpose of this class is to provide the same functionality as of Hashtable but with a performance comparable to HashMap.
Now let's come to the main point how concurrenthashmap works internally.
Below diagram shows how hashtable/hashmap look like,
ConcurrentHashMap added one Array on top of it and each index of this additional array represents complete HashMap. Additional array is called Segment in ConcurrentHashMap.
Architecture of ConcurrentHashMap looks like below:
Putting key-value pair:
1. Putting key-value pair in ConcurrentHashMap requires first identifying exact index in
Segment array.
(Once Segment array index is identified, Now flow will be exactly same as putting the data in
hashmap/hashtable.)
2. After identifying index in Segment array, next task is to identify index of internal bucket/array
present in internal hashmap as shown in figure above.
3. After identying bucket(internal array index), iterate key-value pairs and check each key with key
to store, wherever match is found replace stored value with value to store.
If there is no match, store key-value pair at the last of list.
Getting key-value pair:
1. Getting key-value pair in ConcurrentHashMap requires first identifying exact index in
Segment array.
(Once Segment array index is identified, Now flow will be exactly same as getting the data from
hashmap/hashtable.)
2. After identifying index in Segment array, next task is to identify index of internal bucket/array
present in internal hashmap as shown in figure above.
3. After identying bucket(internal array index), iterate key-value pairs and match each key with
given key, wherever match is found return value stored against key.
If there is no match, return null.
Hashtable is not efficient beacause it uses map wide lock, it means lock is applied on map object itself.
ConcurrentHashMap works bit different here and instead of locking complete map object it Locks per Segment.
It means instead of single map wide lock, it has multiple Segment level lock.
So 2 Threads can execute put operation simultaneously by acquiring lock on different Segments.
Thread T1 calls concurrentHashMap.put(key, value), It acquires lock on say Segment 1 and invokes put method.
Thread T2 calls concurrentHashMap.put(key, value), It acquires lock on say Segment 4 and invokes put method.
Both threads doesn't interfere with each other and both can proceed simultaneously as they are working on separate Segment locks.
This is how ConcurrentHashMap improves Performance and provide Thread safety as well.
Read Operation: get(key)
Same Segment/Different Segment : Yes.
Two threads T1 and T2 both can simultaneously read data from same Segment or different Segment of CHM simultaneously without blocking each other.
Write Operation: put(key, value)
Different Segment :Yes
Multiple threads can write data to different Segment of CHM simultaneously without blocking each other.
Same Segment : No
Multiple threads CANNOT write data to same Segment of CHM simultaneously and need to wait for one thread to come write operation and then only other write operation can be proceed.
Read-Write Operation: get and put
Say T1 is writing data in Segment 1 and T2 is reading data from same Segment 1, can read be allowed while writing operation is going on?
YES.
Both operation that is T1 writing and T2 reading can be done parallely.
What data will T2 read if T1 is updating same data?
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove).
Latest updated value present will be returned by get operation that is value updated by most recently completed update operationswill be returned.
Note: Get operations are lock free and can be performed simulateneously irrespective of other thread writing data on same or different Segment of CHM.
By default Segments array size is 16, So maximum 16 threads can simultaneously put data in map considering each thread is working on separate Segment array index.
How Segment array size is tuned?
Segment size decides the number of Threads that can paralley write to a map.
Segment array size is configured using ConcurrencyLevel parameter as shown below:
Concurrency level is 10, it means at any given point of time Segment array size will be 10 or greater than 10, so that 10 threads can able to parallely write to a map.
Segment array size will decide how many parallel Threads will be able to execute put operation on map parallely.
So Segment array size should not be too big or should not be too small because,
Using a significantly higher value than we will waste space and time, and a significantly lower value can lead to thread competition.
Before learning How ConcurrentHashMap works in Java , we need to look at why concurrentHashMap is added to the Java SDK.
Why we need ConcurrentHashMap when we already had Hashtable ?
Hashtable provides concurrent access to the Map.Entries objects by locking the entire map to perform any sort of operation (update,delete,read,create). Suppose we have a web application , the overhead created by Hashtable (locking the entire map) can be ignored under normal load. But under heavy load , the overhead of locking the entire map may prove fatal and may lead to delay response time and overtaxing of the server.
This is where ConcurrentHashMap comes to rescue. According to ConcurrentHashMap Oracle docs,
ConcurrentHashMap class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details. So the main purpose of this class is to provide the same functionality as of Hashtable but with a performance comparable to HashMap.
Now let's come to the main point how concurrenthashmap works internally.
Below diagram shows how hashtable/hashmap look like,
ConcurrentHashMap added one Array on top of it and each index of this additional array represents complete HashMap. Additional array is called Segment in ConcurrentHashMap.
Architecture of ConcurrentHashMap looks like below:
Putting key-value pair:
1. Putting key-value pair in ConcurrentHashMap requires first identifying exact index in
Segment array.
(Once Segment array index is identified, Now flow will be exactly same as putting the data in
hashmap/hashtable.)
2. After identifying index in Segment array, next task is to identify index of internal bucket/array
present in internal hashmap as shown in figure above.
3. After identying bucket(internal array index), iterate key-value pairs and check each key with key
to store, wherever match is found replace stored value with value to store.
If there is no match, store key-value pair at the last of list.
Getting key-value pair:
1. Getting key-value pair in ConcurrentHashMap requires first identifying exact index in
Segment array.
(Once Segment array index is identified, Now flow will be exactly same as getting the data from
hashmap/hashtable.)
2. After identifying index in Segment array, next task is to identify index of internal bucket/array
present in internal hashmap as shown in figure above.
3. After identying bucket(internal array index), iterate key-value pairs and match each key with
given key, wherever match is found return value stored against key.
If there is no match, return null.
How ConcurrentHashMap is efficient in terms of Performance and Thread safety?
Hashtable is not efficient beacause it uses map wide lock, it means lock is applied on map object itself.
ConcurrentHashMap works bit different here and instead of locking complete map object it Locks per Segment.
It means instead of single map wide lock, it has multiple Segment level lock.
So 2 Threads can execute put operation simultaneously by acquiring lock on different Segments.
Thread T1 calls concurrentHashMap.put(key, value), It acquires lock on say Segment 1 and invokes put method.
Thread T2 calls concurrentHashMap.put(key, value), It acquires lock on say Segment 4 and invokes put method.
Both threads doesn't interfere with each other and both can proceed simultaneously as they are working on separate Segment locks.
This is how ConcurrentHashMap improves Performance and provide Thread safety as well.
Can multiple threads read and write from same or different Segments of ConcurrentHashMap simultaneously?
Read Operation: get(key)
Same Segment/Different Segment : Yes.
Two threads T1 and T2 both can simultaneously read data from same Segment or different Segment of CHM simultaneously without blocking each other.
Write Operation: put(key, value)
Different Segment :Yes
Multiple threads can write data to different Segment of CHM simultaneously without blocking each other.
Same Segment : No
Multiple threads CANNOT write data to same Segment of CHM simultaneously and need to wait for one thread to come write operation and then only other write operation can be proceed.
Read-Write Operation: get and put
Say T1 is writing data in Segment 1 and T2 is reading data from same Segment 1, can read be allowed while writing operation is going on?
YES.
Both operation that is T1 writing and T2 reading can be done parallely.
What data will T2 read if T1 is updating same data?
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove).
Latest updated value present will be returned by get operation that is value updated by most recently completed update operationswill be returned.
Note: Get operations are lock free and can be performed simulateneously irrespective of other thread writing data on same or different Segment of CHM.
What is the default size of Segment array? how it is tuned? What is ConcurrenyLevel in case of CHM?
By default Segments array size is 16, So maximum 16 threads can simultaneously put data in map considering each thread is working on separate Segment array index.
How Segment array size is tuned?
Segment size decides the number of Threads that can paralley write to a map.
Segment array size is configured using ConcurrencyLevel parameter as shown below:
ConcurrentHashMap m = new ConcurrentHashMap(initialCapacity, loadFactor, concurrencyLevel)
Concurrency level is 10, it means at any given point of time Segment array size will be 10 or greater than 10, so that 10 threads can able to parallely write to a map.
What will happen if the size of Segment array is too small or too large?
Choosing correct ConcurrencyLevel is very important because ConcurrencyLevel decides what will be the size of Segment array.Segment array size will decide how many parallel Threads will be able to execute put operation on map parallely.
So Segment array size should not be too big or should not be too small because,
Using a significantly higher value than we will waste space and time, and a significantly lower value can lead to thread competition.
If we choose ConcurrenyLevel as 10 then what will be size of Segment array? Is Segment array size exactly same as concurrenyLevel? If No, then how is the Segment array size calculated?
Segment array size is calculated based on concurrenyLevel specified but it doesn't mean it will be exactly same as concurrenyLevel.
If concurrenyLevel is 10 then Segment array size will be 16.
Segment array size = 2 to the power x, where result should be >= concurrenyLevel(in our case it is 10)
Segment array size = 2 to the power x >= 10
Segment array size = 2 ^ 1 = 2 >= 10 (False)
Segment array size = 2 ^ 2 = 4 >= 10 (False)
Segment array size = 2 ^ 3 = 8 >= 10 (False)
Segment array size = 2 ^ 4 = 16 >= 10 (True)
Segment array size is 16.
Example: 2
concurrenyLevel = 8 then Segment array size = ?
Find 2 ^ x >= 8
2 ^ 1 >= 2
2 ^ 2 >= 4
2 ^ 3 >= 8
Segment array size will be 8.
What is HashEntry array and how is the size of HashEntry decided?
Default initial capacity of CHM is 16. It means CHM make sure there is sufficient space to accomodate 16 key-value pairs after CHM is created.
What is HashEntry[]?
We saw that each index of Segment array itself is complete HashMap,
In HashMap the bucket/array is of class Entry[] and in CHM the array is of class HashEntry[].
How HashEntry[] array size will is calculated?
HashEntry[] array size = 2 ^ x >= (initialCapacity / concurrenyLevel)
Eg: ConcurrentHashMap(32, 0.75f, 4);
HashEntry[] array size = 2 ^ 1 = 2 >= 8(32/4) (False)
HashEntry[] array size = 2 ^ 2 = 4 >= 8 (False)
HashEntry[] array size = 2 ^ 3 = 8 >= 8 (True)
HashEntry[] array size = 8.
It means there will always be capacity of 8 key-value pairs that can be put in CHM after its creation.
How does ConcurrentHashMap handle rehashing while another thread is still writing on another segment/partition?
In CHM, Every segment is separately rehashed so there is no collision between Thread 1 writing to Segment index 2 and Thread 2 writing to Segment index 5.
Example: If say Thread 1 which is putting data in Segment[] array index 2 finds that HashEntry[] array needs to be rehashed due to exceed Load factor capacity then it will rehash HashEntry[] array present at Segment[] array index 2 only.
HashEntry[] array at other Segment indexes will still be intact, unaffected and continue to serve put and get request parallely.
This covers the important concepts of ConcurrentHashMap and few tricky questions. If you have question or you want to add anything then please comment below.
you said that a high level of concurrency wastes time and space.
ReplyDeleteI understand space, but why is time wasted?