SlideShare une entreprise Scribd logo
1  sur  7
Integrating Lock-Free and Combining Techniques for a
Practical and Scalable FIFO Queue
ABSTRACT:
Concurrent FIFO queues can be generally classified into lock-free queues and
combining-based queues. Lock-free queues require manual parameter tuning to
control the contention level of parallel execution, while combining-based queues
encounter a bottleneck of single-threaded sequential combiner executions at a high
concurrency level. In this paper, we introduce a different approach using both lock-
free techniques and combining techniques synergistically to design a practical and
scalable concurrent queue algorithm. As a result, we have achieved high scalability
without any parameter tuning: on an 80-thread average throughput in our
experimental results, our queue algorithm outperforms the most widely used
Michael and Scott queue by 14.3 times, the best-performing combining-based
queue by 1.6 times, and the best performing _86-dependent lock-free queue by 1.7
percent. In addition, we designed our algorithm in such a way that the life cycle of
a node is the same as that of its element. This has huge advantages over prior work:
efficient implementation is possible without dedicated memory
management schemes, which are supported only in some languages, may cause a
performance bottleneck, or are patented. Moreover, the synchronized life cycle
between an element and its node enables application developers to further optimize
memory management.
EXISTING SYSTEM:
THE FIFO queues are one of the most fundamental and highly studied concurrent
data structures. They are essential building blocks of libraries [1], [2], [3], runtimes
for pipeline parallelism [4], [5], and high performance tracing systems [6]. These
queues can be categorized according to whether they are based on static allocation
of a circular array or on dynamic allocation in a linked list, and whether or not they
support multiple enqueuers and dequeuers. This paper focuses on dynamically
allocated FIFO queues supporting multiple enqueuers and dequeuers. Although
extensive research has been performed to develop scalable and practical concurrent
queues, there are two remaining problems that limit wider practical use: (1)
scalability is still limited in a high level of concurrency or it is difficult to achieve
without sophisticated parameter tuning; and (2) the use of dedicated memory
management schemes for the safe reclamation of removed nodes imposes
unnecessary overhead and limits the further optimization of memory management
[7], [8], [9]. In terms of scalability, avoiding the contended hot spots, Head and
Tail, is the fundamental principle in designing concurrent queues. In this regard,
there are two seemingly contradictory approaches. Lock-free approaches use fine-
grain synchronization to maximize the degree of parallelism and thus improve
performance.
The MS queue presented by Michael and Scott [10] is the most well-known
algorithm, and many works to improve the MS queue have been proposed [10],
[11], [12], [13], [14]. They use compare-and swap (CAS) to update Head and Tail.
In the CAS, however, of all contending threads, only one will succeed, and all the
other threads will fail and retry until they succeed. Since the failing threads use not
only computational resources, but also the memory bus, which is a shared resource
in cache-coherent multiprocessors, they also slow down the succeeding thread. A
common way to reduce such contention is to use an exponential backoff scheme
[10], [15], which spreads out CAS retries over time. Unfortunately, this process
requires manual tuning of the backoff parameters for a particular workload and
machine combination. To avoid this disadvantage, Morrison and Afek [16]
proposed a lockfree queue based on fetch-and-add (F&A) atomic instructions,
which always succeeds but is _86-dependent. Though the _86 architecture has a
large market share, as the core count increases, industry and academia have
actively developed various processor architectures to achieve high scalability and
low power consumption and thus we need more portable solutions.
PROPOSED SYSTEM:
In this regard, we propose a linearizable concurrent queue algorithm: Lock-free
Enqueue and Combining Dequeue (LECD) queue. In our LECD queue, enqueue
operations are performed in a lock-free manner, and dequeue operations are
performed in a combining manner. We carefully designed the LECD queue so that
the LECD queue requires neither retrying atomic instructions nor tuning
parameters for limiting the contention level; a SWAP instruction used in the
enqueue operation always succeeds and a CAS instruction used in the dequeue
operation is designed not to require retrying when it fails. Using the combining
techniques in the dequeue operation significantly improves scalability and has
additional advantages together with the lock-free enqueue operation: (1) by virtue
of the concurrent enqueue operations, we can prevent the combiner from becoming
a performance bottleneck, and (2) the higher combining degree incurred by the
concurrent enqueue operations makes the combining operation more efficient. To
our knowledge, the LECD queue is the first concurrent data structure that uses both
lock-free and combining techniques.
 Using dedicated memory management schemes is another aspect that
hinders the wider practical use of prior concurrent queues. The fundamental
reason for using dedicated schemes is in the mismatch at the end of life
between an element and its node. In this regard, we made two non-
traditional, but practical, design decisions. First, to fundamentally avoid
read/reclaim races, the LECDqueue is designed for one thread to access
Head and Tail at a time. Since dequeue operations are serialized by a single
threaded combiner, only one thread accesses Head at a time. Also, a thread
always accesses Tail through a private copy obtained as a result of SWAP
operation, so there is no contention on accessing the private copy. Second,
we introduce a permanent dummy node technique to synchronize the end of
life between an element and its node. We do not recycle the last dequeued
node as a dummy node. Instead, the initially allocated dummy node is
permanently used by updating Head’s next instead of Head in dequeue
operations. Our approach has huge advantages over prior work. Efficient
implementations of the LECD queue are possible, even in languages which
do not support GC, without using dedicated memory management schemes.
Also, synchronizing the end of life between an element and its node opens
up opportunities for further optimization, such as by embedding a node into
its element.
 We compared our queues to the state-of-the-art lockfree queues [10], [16]
and combining-based queues [19] in a system with 80 hardware threads.
Experimental results show that, except for queues [16] which support only
Intel ×86 architecture, the LECD queue performs best. Even in comparison
with the ×86-dependent queue [16] whose parameter is manually tuned
beforehand, the LECD queue outperforms the ×86-dependent queue in some
benchmark configurations. Since the LECD queue requires neither
parameter tuning nor dedicated memory management schemes, we expect
that the LECD queue can be easily adopted and used in practice.
SOFTWARE IMPLEMENTATION:
 Modelsim 6.0
 Xilinx 14.2
HARDWARE IMPLEMENTATION:
 SPARTAN-III, SPARTAN-VI

Contenu connexe

En vedette

Passive ip traceback disclosing the locations
Passive ip traceback disclosing the locationsPassive ip traceback disclosing the locations
Passive ip traceback disclosing the locations
jpstudcorner
 

En vedette (10)

Energy consumption of vlsi decoders
Energy consumption of vlsi decodersEnergy consumption of vlsi decoders
Energy consumption of vlsi decoders
 
An efficient list decoder architecture for polar codes
An efficient list decoder architecture for polar codesAn efficient list decoder architecture for polar codes
An efficient list decoder architecture for polar codes
 
Timing error tolerance in small core designs for so c applications
Timing error tolerance in small core designs for so c applicationsTiming error tolerance in small core designs for so c applications
Timing error tolerance in small core designs for so c applications
 
Privacy preserving and truthful detection
Privacy preserving and truthful detectionPrivacy preserving and truthful detection
Privacy preserving and truthful detection
 
Defeating jamming with the power of silence
Defeating jamming with the power of silenceDefeating jamming with the power of silence
Defeating jamming with the power of silence
 
Image super resolution based on
Image super resolution based onImage super resolution based on
Image super resolution based on
 
A historical beacon-aided localization algorithm for mobile sensor networks
A historical beacon-aided localization algorithm for mobile sensor networksA historical beacon-aided localization algorithm for mobile sensor networks
A historical beacon-aided localization algorithm for mobile sensor networks
 
An efficient distributed trust model for wireless
An efficient distributed trust model for wirelessAn efficient distributed trust model for wireless
An efficient distributed trust model for wireless
 
Privacy preserving and truthful detection
Privacy preserving and truthful detectionPrivacy preserving and truthful detection
Privacy preserving and truthful detection
 
Passive ip traceback disclosing the locations
Passive ip traceback disclosing the locationsPassive ip traceback disclosing the locations
Passive ip traceback disclosing the locations
 

Similaire à Integrating lock free and combining techniques for a practical and scalable fifo queue

Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
igeeks1234
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
igeeks1234
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
faithxdunce63732
 
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER  WR...AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER  WR...
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
Vijay Prime
 
isca-95-partial-pred
isca-95-partial-predisca-95-partial-pred
isca-95-partial-pred
Jim McCormick
 
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
VLSICS Design
 
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
VLSICS Design
 
PMAM_2014_Autonomic_Skeletons_using_Events-FinalVersion
PMAM_2014_Autonomic_Skeletons_using_Events-FinalVersionPMAM_2014_Autonomic_Skeletons_using_Events-FinalVersion
PMAM_2014_Autonomic_Skeletons_using_Events-FinalVersion
Gustavo Pabon
 

Similaire à Integrating lock free and combining techniques for a practical and scalable fifo queue (20)

shashank_micro92_00697015
shashank_micro92_00697015shashank_micro92_00697015
shashank_micro92_00697015
 
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismSummary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
 
Me,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiMe,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsi
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
 
shashank_hpca1995_00386533
shashank_hpca1995_00386533shashank_hpca1995_00386533
shashank_hpca1995_00386533
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER  WR...AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER  WR...
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
 
SPROJReport (1)
SPROJReport (1)SPROJReport (1)
SPROJReport (1)
 
IEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and AbstractIEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and Abstract
 
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
 
isca-95-partial-pred
isca-95-partial-predisca-95-partial-pred
isca-95-partial-pred
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency models
 
Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...
 
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
 
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
PMAM_2014_Autonomic_Skeletons_using_Events-FinalVersion
PMAM_2014_Autonomic_Skeletons_using_Events-FinalVersionPMAM_2014_Autonomic_Skeletons_using_Events-FinalVersion
PMAM_2014_Autonomic_Skeletons_using_Events-FinalVersion
 
Recursive approach to the design of a parallel self timed adder
Recursive approach to the design of a parallel self timed adderRecursive approach to the design of a parallel self timed adder
Recursive approach to the design of a parallel self timed adder
 

Plus de jpstudcorner

Plus de jpstudcorner (20)

Variable length signature for near-duplicate
Variable length signature for near-duplicateVariable length signature for near-duplicate
Variable length signature for near-duplicate
 
Robust representation and recognition of facial
Robust representation and recognition of facialRobust representation and recognition of facial
Robust representation and recognition of facial
 
Revealing the trace of high quality jpeg
Revealing the trace of high quality jpegRevealing the trace of high quality jpeg
Revealing the trace of high quality jpeg
 
Revealing the trace of high quality jpeg
Revealing the trace of high quality jpegRevealing the trace of high quality jpeg
Revealing the trace of high quality jpeg
 
Pareto depth for multiple-query image retrieval
Pareto depth for multiple-query image retrievalPareto depth for multiple-query image retrieval
Pareto depth for multiple-query image retrieval
 
Multifocus image fusion based on nsct
Multifocus image fusion based on nsctMultifocus image fusion based on nsct
Multifocus image fusion based on nsct
 
Fractal analysis for reduced reference
Fractal analysis for reduced referenceFractal analysis for reduced reference
Fractal analysis for reduced reference
 
Face sketch synthesis via sparse representation based greedy search
Face sketch synthesis via sparse representation based greedy searchFace sketch synthesis via sparse representation based greedy search
Face sketch synthesis via sparse representation based greedy search
 
Face recognition across non uniform motion
Face recognition across non uniform motionFace recognition across non uniform motion
Face recognition across non uniform motion
 
Combining left and right palmprint images for
Combining left and right palmprint images forCombining left and right palmprint images for
Combining left and right palmprint images for
 
A probabilistic approach for color correction
A probabilistic approach for color correctionA probabilistic approach for color correction
A probabilistic approach for color correction
 
A no reference texture regularity metric
A no reference texture regularity metricA no reference texture regularity metric
A no reference texture regularity metric
 
A feature enriched completely blind image
A feature enriched completely blind imageA feature enriched completely blind image
A feature enriched completely blind image
 
Sel csp a framework to facilitate
Sel csp a framework to facilitateSel csp a framework to facilitate
Sel csp a framework to facilitate
 
Query aware determinization of uncertain
Query aware determinization of uncertainQuery aware determinization of uncertain
Query aware determinization of uncertain
 
Psmpa patient self controllable
Psmpa patient self controllablePsmpa patient self controllable
Psmpa patient self controllable
 
Privacy policy inference of user uploaded
Privacy policy inference of user uploadedPrivacy policy inference of user uploaded
Privacy policy inference of user uploaded
 
Page a partition aware engine
Page a partition aware enginePage a partition aware engine
Page a partition aware engine
 
Optimal configuration of network
Optimal configuration of networkOptimal configuration of network
Optimal configuration of network
 
Multiview alignment hashing for
Multiview alignment hashing forMultiview alignment hashing for
Multiview alignment hashing for
 

Dernier

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Dernier (20)

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 

Integrating lock free and combining techniques for a practical and scalable fifo queue

  • 1. Integrating Lock-Free and Combining Techniques for a Practical and Scalable FIFO Queue ABSTRACT: Concurrent FIFO queues can be generally classified into lock-free queues and combining-based queues. Lock-free queues require manual parameter tuning to control the contention level of parallel execution, while combining-based queues encounter a bottleneck of single-threaded sequential combiner executions at a high concurrency level. In this paper, we introduce a different approach using both lock- free techniques and combining techniques synergistically to design a practical and scalable concurrent queue algorithm. As a result, we have achieved high scalability without any parameter tuning: on an 80-thread average throughput in our experimental results, our queue algorithm outperforms the most widely used Michael and Scott queue by 14.3 times, the best-performing combining-based queue by 1.6 times, and the best performing _86-dependent lock-free queue by 1.7 percent. In addition, we designed our algorithm in such a way that the life cycle of a node is the same as that of its element. This has huge advantages over prior work: efficient implementation is possible without dedicated memory
  • 2. management schemes, which are supported only in some languages, may cause a performance bottleneck, or are patented. Moreover, the synchronized life cycle between an element and its node enables application developers to further optimize memory management. EXISTING SYSTEM: THE FIFO queues are one of the most fundamental and highly studied concurrent data structures. They are essential building blocks of libraries [1], [2], [3], runtimes for pipeline parallelism [4], [5], and high performance tracing systems [6]. These queues can be categorized according to whether they are based on static allocation of a circular array or on dynamic allocation in a linked list, and whether or not they support multiple enqueuers and dequeuers. This paper focuses on dynamically allocated FIFO queues supporting multiple enqueuers and dequeuers. Although extensive research has been performed to develop scalable and practical concurrent queues, there are two remaining problems that limit wider practical use: (1) scalability is still limited in a high level of concurrency or it is difficult to achieve without sophisticated parameter tuning; and (2) the use of dedicated memory management schemes for the safe reclamation of removed nodes imposes unnecessary overhead and limits the further optimization of memory management [7], [8], [9]. In terms of scalability, avoiding the contended hot spots, Head and
  • 3. Tail, is the fundamental principle in designing concurrent queues. In this regard, there are two seemingly contradictory approaches. Lock-free approaches use fine- grain synchronization to maximize the degree of parallelism and thus improve performance. The MS queue presented by Michael and Scott [10] is the most well-known algorithm, and many works to improve the MS queue have been proposed [10], [11], [12], [13], [14]. They use compare-and swap (CAS) to update Head and Tail. In the CAS, however, of all contending threads, only one will succeed, and all the other threads will fail and retry until they succeed. Since the failing threads use not only computational resources, but also the memory bus, which is a shared resource in cache-coherent multiprocessors, they also slow down the succeeding thread. A common way to reduce such contention is to use an exponential backoff scheme [10], [15], which spreads out CAS retries over time. Unfortunately, this process requires manual tuning of the backoff parameters for a particular workload and machine combination. To avoid this disadvantage, Morrison and Afek [16] proposed a lockfree queue based on fetch-and-add (F&A) atomic instructions, which always succeeds but is _86-dependent. Though the _86 architecture has a large market share, as the core count increases, industry and academia have actively developed various processor architectures to achieve high scalability and low power consumption and thus we need more portable solutions.
  • 4. PROPOSED SYSTEM: In this regard, we propose a linearizable concurrent queue algorithm: Lock-free Enqueue and Combining Dequeue (LECD) queue. In our LECD queue, enqueue operations are performed in a lock-free manner, and dequeue operations are performed in a combining manner. We carefully designed the LECD queue so that the LECD queue requires neither retrying atomic instructions nor tuning parameters for limiting the contention level; a SWAP instruction used in the enqueue operation always succeeds and a CAS instruction used in the dequeue operation is designed not to require retrying when it fails. Using the combining techniques in the dequeue operation significantly improves scalability and has additional advantages together with the lock-free enqueue operation: (1) by virtue of the concurrent enqueue operations, we can prevent the combiner from becoming a performance bottleneck, and (2) the higher combining degree incurred by the concurrent enqueue operations makes the combining operation more efficient. To our knowledge, the LECD queue is the first concurrent data structure that uses both lock-free and combining techniques.  Using dedicated memory management schemes is another aspect that hinders the wider practical use of prior concurrent queues. The fundamental
  • 5. reason for using dedicated schemes is in the mismatch at the end of life between an element and its node. In this regard, we made two non- traditional, but practical, design decisions. First, to fundamentally avoid read/reclaim races, the LECDqueue is designed for one thread to access Head and Tail at a time. Since dequeue operations are serialized by a single threaded combiner, only one thread accesses Head at a time. Also, a thread always accesses Tail through a private copy obtained as a result of SWAP operation, so there is no contention on accessing the private copy. Second, we introduce a permanent dummy node technique to synchronize the end of life between an element and its node. We do not recycle the last dequeued node as a dummy node. Instead, the initially allocated dummy node is permanently used by updating Head’s next instead of Head in dequeue operations. Our approach has huge advantages over prior work. Efficient implementations of the LECD queue are possible, even in languages which do not support GC, without using dedicated memory management schemes. Also, synchronizing the end of life between an element and its node opens up opportunities for further optimization, such as by embedding a node into its element.  We compared our queues to the state-of-the-art lockfree queues [10], [16] and combining-based queues [19] in a system with 80 hardware threads.
  • 6. Experimental results show that, except for queues [16] which support only Intel ×86 architecture, the LECD queue performs best. Even in comparison with the ×86-dependent queue [16] whose parameter is manually tuned beforehand, the LECD queue outperforms the ×86-dependent queue in some benchmark configurations. Since the LECD queue requires neither parameter tuning nor dedicated memory management schemes, we expect that the LECD queue can be easily adopted and used in practice. SOFTWARE IMPLEMENTATION:  Modelsim 6.0  Xilinx 14.2