SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
ASSIGNMENT
Module Code
Module Name
Course
Department

ESD 532
Multi core Architecture and Programming
M.Sc. [Engg.] in Real Time Embedded
Systems
Computer Engineering

Name of the Student

Bhargav Shah

Reg. No

CHB0910001

Batch

Full-Time 2011.

Module Leader

Padma Priya Dharishini P.

M.S.Ramaiah School of Advanced Studies
Postgraduate Engineering and Management Programmes(PEMP)
#470-P Peenya Industrial Area, 4th Phase, Peenya, Bengaluru-560 058
Tel; 080 4906 5555, website: www.msrsas.org

POSTGRADUATE ENGINEERING AND MANAGEMENT PROGRAMME – (PEMP)

MSRSAS - Postgraduate Engineering and Management Programme - PEMP

i
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

Declaration Sheet
Student Name

Bhargav Shah

Reg. No

CHB0910001

Course

RTES

Batch

Full-Time 2011

Module Code

ESD 532

Module Title
Module Date

Multi Core Architecture and Programming
to
06-02-2012
03-03-2012

Module Leader

Padma Priya Dharishini P.

Batch Full-Time 2011.

Extension requests:
Extensions can only be granted by the Head of the Department in consultation with the module leader.
Extensions granted by any other person will not be accepted and hence the assignment will incur a penalty.
Extensions MUST be requested by using the ‘Extension Request Form’, which is available with the ARO.
A copy of the extension approval must be attached to the assignment submitted.

Penalty for late submission
Unless you have submitted proof of mitigating circumstances or have been granted an extension, the
penalties for a late submission of an assignment shall be as follows:
• Up to one week late:
Penalty of 5 marks
• One-Two weeks late:
Penalty of 10 marks
• More than Two weeks late:
Fail - 0% recorded (F)
All late assignments: must be submitted to Academic Records Office (ARO). It is your responsibility to
ensure that the receipt of a late assignment is recorded in the ARO. If an extension was agreed, the
authorization should be submitted to ARO during the submission of assignment.
To ensure assignment reports are written concisely, the length should be restricted to a limit
indicated in the assignment problem statement. Assignment reports greater than this length may
incur a penalty of one grade (5 marks). Each delegate is required to retain a copy of the assignment
report.

Declaration
The assignment submitted herewith is a result of my own investigations and that I have conformed to the
guidelines against plagiarism as laid out in the PEMP Student Handbook. All sections of the text and
results, which have been obtained from other sources, are fully referenced. I understand that cheating and
plagiarism constitute a breach of University regulations and will be dealt with accordingly.

Signature of the student

Date

Submission date stamp
(by ARO)

Signature of the Module Leader and date

Multi core Architecture and Programming

Signature of Head of the Department and date

ii
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

M. S. Ramaiah School of Advanced Studies
Postgraduate Engineering and Management Programme- Coventry University (UK)
Assessment Sheet
Department

Computer Engineering

Course

RTES

Module Code

ESD 532

Module Leader

Padma Priya Dharishini P.

Module Completion
Date

03-03-2012

Student Name

Bhargav Shah

ID Number

CHB0910001

Attendance Details

Batch
Module Title

Theory

Full-Time 2011

Multi core Architecture and Programming

Laboratory

Fine Paid

Remarks

(if any for shortage of attendance)

Q. No

a

Written Examination – Marks – Sheet (Assessor to Fill)
C
d
Total
Remarks

b

1
2
3
4
5
6
Marks Scored for 100

Part

a

Marks Scored out of 50

Result
PASS
Assignment – Marks-Sheet (Assessor to Fill)
C
d
Total

b

FAIL
Remarks

A
B
C
Marks Scored for 100
Result

Marks Scored out of 50
PASS

FAIL

PMAR- form completed for student feedback (Assessor has to mark)
Yes
Overall-Result
Components
Assessor
Reviewer

No

Written Examination (Max 50)

Pass / Fail

Assignment (Max 50)

Pass / Fail

Total Marks (Max 100) (Before Late Penalty)

Grade

Total Marks (Max 100) (After Late Penalty)

Grade

IMPORTANT
1. The assignment and examination marks have to be rounded off to the nearest integer and entered in the respective fields
2. A minimum of 40% required for a pass in both assignment and written test individually
3. A student cannot fail on application of late penalty (i.e. on application of late penalty if the marks are below 40, cap at 40 marks)

Signature of Reviewer with date

Multi core Architecture and Programming

Signature of Module Leader with date

iii
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

Abstract
Multi-core processors may provide higher performance to current embedded processors to
support future embedded systems functionalities. According to the Industrial Advisor Board,
embedded systems will benefit from multi-core processors, as these systems are comprised by
mixed applications, i.e. applications with hard real-time constrains and without real-time constrains,
that can be executed into the same processor.
Moreover, the Industrial Advisor Board also stated that memory operations represent one of
the main bottlenecks that current embedded applications must face, being even more important than
the performance of the core that can suffer a degradation of 10-20% without really affecting overall
performance. We take profit of this fact by studying the effect of running several threads per core,
that is, we make the core multithreaded. And we also studied the effect of caches, which are a well
known technique in high performance computing to reduce the memory bottleneck.
Chapter 1 discuss on Arbitration schemes of Memory Access in Multicore Systems , what
are the types of arbitration schemes existed up to now which is the best one of them, what are the
challenging factors for these arbitration schemes in the present situation and finally short note on
the factors that support the proposed arbitration schemes.
Chapter 2 discuss about a multi-threaded concept of consumer and producer threads how are
going to share a common queue, how to prioritize the threads if we are sharing a common thread
and some test cases to test the scenarios.
Chapter 3 discuss about a different situation having 4 of producers with different queues and
single consumers and it will discuss about the changing priority levels of the consumer so that in
the conflicting condition with the consumer thread the producer will get the high priority to
execute.

Multi core Architecture and Programming

iv
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

Contents
Declaration Sheet .................................................................................................................................ii
Abstract .............................................................................................................................................. iv
List of Figures ...................................................................................................................................vii
Symbols .............................................................................................................................................vii
Nomenclature...................................................................................................................................viii
CHAPTER 1 ....................................................................................................................................... 9
Arbitration schemes of memory access in multi core ..................................................................... 9
1.1 Introduction ...........................................................................................................................9
1.2 Types of arbitration schemes .................................................................................................9
1.3 Challenges in arbitration schemes .......................................................................................10
1.4 Impact of the arbitration schemes on throughput and latency .................................................11
1.5 Proposal of better arbitration scheme with justification ..........................................................11
1.6 Conclusion ...............................................................................................................................12
CHAPTER 2 ..................................................................................................................................... 13
Development of Consumer Producer Application ........................................................................ 13
2.1 Introduction ..............................................................................................................................13
2.2 Sequence diagram ....................................................................................................................13
2.3 Development of parallelized program using Pthread/openMP ................................................14
2.4 Test cases and Testing results for scenario 1 ...........................................................................17
2.4.1 Test cases ........................................................................................................................................ 17
2.4.2 Testing results................................................................................................................................. 18
2.5 Sequence diagram.............................................................................................................................. 19

2.6 Development of paralleled program using pthread/openMP ...................................................20
2.4 Test cases and Testing results for scenario 2 ...........................................................................23
2.4.1 Test cases ........................................................................................................................................ 23
2.4.2 Testing results................................................................................................................................. 24

2.5 Conclusion ...............................................................................................................................25
CHAPTER 3 ..................................................................................................................................... 26
Development of Consumer Producer Application with extended priority concept ................... 26
3.1 Introduction ..............................................................................................................................26
3.2 Sequence diagram................................................................................................................26
3.2 Development of designed application ......................................................................................27
3.3 Test cases and testing results for scenario 3 ........................................................................34
3.3.1 Test cases........................................................................................................................................ 34
3.3.2 Documentation of the results ........................................................................................................ 35

3.5 Conclusion ...............................................................................................................................36
CHAPTER 4 ..................................................................................................................................... 37
4.1 Module Learning Outcomes.....................................................................................................37
4.2 Conclusion ...............................................................................................................................37
References ......................................................................................................................................... 38
Appendix-1 ........................................................................................................................................ 39
Appendix-2 ........................................................................................................................................ 40

Multi core Architecture and Programming

v
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

List of Tabl
Table 2. 1 Test cases for single producer single consumer ................................................................17
Table 2. 2 Test cases for single producer single consumer ................................................................23

Table 3. 1 Test cases for higher priority consumer thread .................................................................34

Multi core Architecture and Programming

vi
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

List of Figures
Figure 2. 1 Sequence diagram for one producer and one consumer ..................................................14
Figure 2. 2 Including Libraries and files for scenario 1 .....................................................................14
Figure 2. 3 Deceleration of mutext and structures for scenario 1 ......................................................14
Figure 2. 4 Function to create new list for scenario 1 ........................................................................15
Figure 2. 5 Main function for Application of scenario 1 ...................................................................15
Figure 2. 6 Body of producer thread for scenario 1 ...........................................................................16
Figure 2. 7 Body of consumer thread for scenario 1 ..........................................................................17
Figure 2. 8 Producer thread is waiting for value in critical region ....................................................18
Figure 2. 9 Consumer thread is printing the value from inserted by producer thread .......................18
Figure 2. 10 Sequence diagram of three producer one consumer ......................................................19
Figure 2. 11 Including Libraries and files for scenario 2 ...................................................................20
Figure 2. 12 Deceleration of mutext and structures for scenario 2 ....................................................20
Figure 2. 13 Function to create new list for scenario 2 ......................................................................20
Figure 2. 14 Main function for application of scenario 2 ..................................................................21
Figure 2. 15 Body of producer thread for scenario 2 .........................................................................22
Figure 2. 16 Body of consumer thread for scenario 2 ........................................................................22
Figure 2. 17 Producer thread is waiting in critical region ..................................................................24
Figure 2. 18 Consumer threads is active after all the producer thread finish the critical region .......24
Figure3. 1 Sequence diagram for prioritized consumer thread ..........................................................27
Figure3. 2 Including library files for scenario 3 ................................................................................27
Figure3. 3 Declaration of constructive functions for scenario 3 ........................................................28
Figure3. 4 Decelaration of list pointers and location pointers ...........................................................28
Figure3. 5 Definition of constructive functions .................................................................................28
Figure3. 6 Declaration of thread function and synchronization objects ............................................29
Figure3. 7 Main function for application of scenario 2 ....................................................................29
Figure3. 8 First producer thread .........................................................................................................30
Figure3. 9 Second producer thread ....................................................................................................30
Figure3. 10 Third producer thread .....................................................................................................31
Figure3. 11 Fourth producer thread ...................................................................................................31
Figure3. 12 Consumer thread with highest priority queue.................................................................32
Figure3. 13 Continuation of consumer thread for second priority queue ..........................................33
Figure3. 14 Continuation of consumer thread for third priority queue .............................................33
Figure3. 15 Continuation of consumer thread for last priority queue ................................................34
Figure3. 16 Results of test cases ........................................................................................................35

Multi core Architecture and Programming

vii
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

Nomenclature

WRR
CMP
SDRAM
DRR
SRR
PD
PBS
TDMA
CCSP

Weighted Round Robin
Chip Multiprocessors
Synchronous Dynamic Random Access Memory
Deficit Round Robin
Stratified Round Robin
Priority Division
Priority Based Budget Scheduler
Time Division Multiple Access
Credit Controlled Static Priority

Multi core Architecture and Programming

viii
CHAPTER 1
Arbitration schemes of memory access in multi core
1.1 Introduction
The constraints of embedded systems in terms of power consumption, thermal dissipation, costefficiency and performance can be met by using multi core processors (CMP or chip
multiprocessors). On typical medium size CMPs, the cores share a bus to the highest levels of the
memory hierarchy. In multi-core architectures, resources are often shared to reduce cost and
exchange information. An off-chip memory is one of the most common shared resources. SDRAM
is a popular off-chip memory currently used in cost sensitive and performance demanding
applications due to its low price, high data rate and large storage. An asynchronous refresh
operation and a dependence on the previous access make SDRAM access latency vary by an order
of magnitude. The main contribution of this report to critically compare throughput and latency for
available arbitration schemes of multi core. At the end justification for batter arbitration scheme is
derived from the analysis.

1.2 Types of arbitration schemes[1]
There have been many approaches to provide fairness, high throughput and worst case latency
bounds in the arbiter especially in the networks domain.
Weighted Round Robin (WRR) is a work conserving arbiter where cores are allocated a
number of slots within a round robin cycle depending on their bandwidth requirements. If a core
does not use its slot, the next active core in the round robin cycle is immediately assigned to
increase the throughput. Cores producing busty traffic benefit at the cost of cores which produce
uniform traffic.
Deficit Round Robin (DRR) assigns different slot sizes to each master according to its
bandwidth requirements and schedules them in a Round Robin (RR) fashion. Difference between
DRR and RR is that if a master cannot use its slot or part of its slot in the current cycle, the
remaining slot (deficit) is added into the next cycle. In the next cycle, the master can transfer up to
an amount of data equal to the sum of its slot size and the deficit. Thus, the DRR tries to avoid the
unfairness caused to uniform traffic generators in the WRR.
Stratified Round Robin (SRR) groups masters with alike bandwidth requirements into one
Class. After grouping masters into various classes ,two step arbitration is applied: interclass and
infraclass. The inter class scheduler schedules each class Fk once in 2k clock cycles. Hence, the
lesser the k, the often the class is scheduled. The intra class scheduler uses WRR mechanism to
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

select the next master within the class. Due to more uniform distribution of bandwidth, SRR
reduces the worst case latencies compared to the WRR. However, to achieve low worst case latency
for a class Fk, k must be minimized which leads to over allocation.
Priority Division (PD) combines TDMA and static priorities to achieve guarantees and high
resource utilization. Instead of fixing TDMA slots statically, PD fixes priorities of each master
within the slot statically such that each master has at least one slot where it has the highest priority.
Thus, masters have guarantees equal to TDMA and unused slots are arbitrated based on static
priority to increase the resource utilization. This approach provides benefit over RR or WRR only if
the response time of the shared resource is fixed. In the case of variable response time (e.g.
SDRAM), this approach produces high worst case latencies.
. In Priority Based Budget Scheduler (PBS), masters are assigned fixed budgets of access in a
unit time (Replenishment Period). Moreover, masters are also assigned fixed priorities to resolve
conflicts. Budget relates to master's bandwidth requirements while priority relates to master's
latency requirements. Thus, coupling between latency and band- width is removed. The shared
resource is granted to the active master with the highest priority which still has a budget left. At the
beginning of a replenishment period, each master gets its original budget back.
Akesson et al introduce a Credit Controlled Static Priority (CCSP) arbiter. The CCSP also uses
priorities and budgets within the replenishment period. But, instead of using frame based
replenishment periods, masters are replenished incrementally for fine grade bandwidth assignment.

1.3 Challenges in arbitration schemes
The traditional shared bus arbitration schemes such as TMDA and round robin show the
several defects such as bus starvation, and low system performance. In strict priority scheduling the
higher priority packets can get the most of the bandwidth therefore the lower priority packets has to
wait for longer time for the resource allocation. This will cause starvation in lower priority packets.
In case of WRR and LARD regarding power consumption is that both of them always have their
servers turned ON even though some of them do not serve any requests. Therefore, they cannot
conserve any power; Weighted Round-Robin and Deficit Round-Robin are extensions that
guarantee each requestor a minimum service, proportional to an allocated rate, in a common
periodically repeating frame of fixed size. This type of frame-based rate regulation is similar to the
Deferrable Server, and suffers from an inherent coupling between allocation granularity and
latency, where allocation granularity is inversely proportional to the frame size. Larger frame size
results in finer allocation granularity, reducing over allocation, but at the cost of increased latencies
for all requestors. Another common example of frame-based scheduling disciplines is time-division

Multi core Architecture and Programming

10
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

multiplexing that suffers from the additional disadvantage that it requires a schedule to be stored for
each configuration, which is very costly if the frame size or the number of use cases are large[2].
The above arbitration algorithms cannot handle the strict real-time requirements, so two-level
arbitration algorithm, which is called the RB_Lottery bus arbitration, has been developed which
will solve the impartiality, starvation and real-time problems, which exist in the Lottery method,
and reduces the average latency for bus requests[5]. Within hardware verifications, the proposed
arbiter processes higher operation frequency than the Lottery arbiter. Although the pays more
attention on the chip area and power consumptions than the Lottery arbiter and it also has less
average latency of bus requests than the Lottery arbitration.

1.4 Impact of the arbitration schemes on throughput and latency[4]
Each approach to provide fairness, high throughput and worst case latency bounds,
optimization of one factor degrades other factors. In Weighted Round Robin to provide low worst
case latency to any core, it has to be assigned more slots in the round robin cycle which leads to
over allocation. Deficit Round Robin (DRR) has very high latencies in the worst case. For example,
one master stays idle for a long time and gains high deficit. Afterwards, it contentiously requests
the shared resource. Since it has gained a high deficit, it will occupy the shared resource for a long
time incurring very high latencies to other masters.
Due to the presence of priorities, PBS is fair to high priority masters and unfair to low priority
masters. When all masters are executing HRTs (as outlined in the introduction), PBS results in large
WCETs for low priority masters. Credit Controlled Static Priority (CCSP) also due to the presence
of the priorities, large worst case execution time bounds for lower priority masters are produced.

1.5 Proposal of better arbitration scheme with justification
Stratified Round Robin is better when compare to the other arbitration, since it is a fairqueuing packet scheduler which has good fairness and delay properties, and low complexity. It is
unique among all other schedulers of comparable complexity in that it provides a single packet
delay bound that is independent of the number of flows. Importantly, it is also enable a simple
hardware implementation, and thus fills a current gap between scheduling algorithms that have
provably good performance and those that are feasible and practical to implement in high-speed
routers. Interactive applications such as video and audio conferencing require the total delay
experienced by a packet in the network to be bounded on an end-to-end basis. The packet scheduler
decides the order in which packets are sent on the output link, and therefore determines the queuing
delay experienced by a packet at each intermediate router in the network. The Low complexity with

Multi core Architecture and Programming

11
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

line rates increasing to 40 Gbps, it is critical that all packet processing tasks performed by routers,
including output scheduling, be able to operate in nanosecond time frames.

1.6 Conclusion
By critically comparing throughput and latency for available arbitration schemes of multi core,
Round Robin is better when compare to the other arbitration, since it is a fair-queuing packet
scheduler which has good fairness and delay properties, and low complexity, even there are lot
more negative aspects in round robin hope replacements would be done in future.

Multi core Architecture and Programming

12
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

CHAPTER 2
Development of Consumer Producer Application
2.1 Introduction
Today, the world of software development is presented with a new challenge. To fully
leverage this new class of multi-core hardware, software developers must change the way they
create applications. By turning their focus to multi-threaded applications, developers will be able to
take full advantage of multi-core devices and deliver software that meets the demands of the world.
But this paradigm of multi-threaded software development adds a new wrinkle of complexity for
those who care the utmost about software quality. Concurrency defects such as race conditions and
deadlocks are software defect types that are unique to multi-threaded applications. Complex and
hard-to-find, these defects can quickly derail a software project. To avoid catastrophic failures in
multithreaded applications, software development organizations must understand how to identify
and eliminate these deadly problems early in the application development lifecycle.
Here as the part of this work multi threaded producer consumer application is created using
the given linked list program. Two scenarios has been accommodate in this part of document. In the
first case producer will insert one value to the doubled linked list and at the other edge the
consumer will read that value and delete it. As the part of the second case three producer threads
tries to insert value in the linked list, at the end one consumer thread tries to read it and delete it.
Proper synchronization mechanism is developed.

2.2 Sequence diagram
A sequence diagram is a kind of interaction diagram that shows how processes operate with
one another and in what order. It is a construct of a Message Sequence Chart. A sequence diagram
shows object interactions arranged in time sequence. It depicts the objects and classes involved in
the scenario and the sequence of messages exchanged between the objects needed to carry out the
functionality of the scenario. Sequence diagrams typically are associated with use case realizations
in the Logical View of the system under development.
Figure 2.1 shows the sequence diagram for the one producer and one consumer .In figure,
the y-axis represents the time and x-axis represents the resources. In top left side one producer
thread and top right side one consumer thread is shown. At the starting, produce has to write the
data in the linked list. But linked list is shared between producer and consumer thread. To provide
synchronization between producer and consumer thread mutext is used. So, producer locks the
mutext and write data to the list. By the same time if the consumer tries to read the data then it tries
to acquire mutext which is taken by producer and it fails. In this case the consumer thread has to

Multi core Architecture and Programming

13
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

wait until the producer releases the mutext.This phenomena shown in Figure 2.1. In this application
consumer can’t read the data until the producer produces it and store it in linked list. This is
synchronization mechanism is achieved by using mutext.

Trying
Obtain
mutext
but fails

Critical
region

Consumer
has wait
until
resource
will be
free by
producer
Figure 2. 1 Sequence diagram for one producer and one consumer

2.3 Development of parallelized program using Pthread/openMP
There are two approaches to develop the threading programs in the Linux. One is using
pthread APIs and the other one is by using openMP APIs. As this part of this scenario pthread APIs
are chosen to develop the single producer and single consumer thread.

Figure 2. 2 Including Libraries and files for scenario 1

Figure 2.2 shows the all preprocessor statements of the code segment. The first one is
“ll2.c” file which is having definition of all the functions related to the linked list operation. The
second one is “ pthread.h”, which is having definition of all the threading related APIs. The last two
are the standard library files for normal functions. As the last lines of the image, a function named
create is declared.

Figure 2. 3 Deceleration of mutext and structures for scenario 1

Multi core Architecture and Programming

14
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

To obtain synchronization in the application mutext is used. Here “lock” is defined as the
pthread mutext object. It is essential to initialize the mutext before using it. Here initialization is
handle by the assignment operation of the macro named “PTHREAD_MUTEX_INITIALIZER”.
Structure to pointer *myList is created which is holding starting address of the list. Structure to
pointer *p is also created which is pointed the current position for accessing the value. These all
declaration is shown by the Figure 2.2.

Figure 2. 4 Function to create new list for scenario 1

Figure 2.4 shows the definition of the creat function. By the calling of this function, it will
create new list. The new list is pointed by the myList pointer. list_crate() is the function which
created the new list and returns addresses of the new list in the form of list_head structure. At the
second line of the pointer p is created to hold the current position of the element in the list.At the
initial level the current position is set to first position. By calling the list_position_creat().

Figure 2. 5 Main function for Application of scenario 1

Figure 2.5 shows the main function for the single producer and single consumer application.
In figure, two functions is declared with the void pointer argument and void pointer return type.
The function named “ser” is the function which is called by the producer thread. The other side
consumer thread will call function “cli”.In the main One void pointer named “exit” is defined to
obtain the return value from the thread function. Here, two thread object is defined named “t_ser”
and “t_cli”.On the successful creation of the producer thread, ID of the thread is stored in the

Multi core Architecture and Programming

15
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

“t_ser” and ID of consumer thread is stored in “t_cli”.To create the thread “pthread_create” API is
used with the appropriate arguments. In this application two threads are created one is producer
thread and other is the consumer thread. The consumer thread dies automatically if the producer of
the consumer thread exits. To avoid this situation main thread has to wait until the consumer thread
exits successfully. This mechanism is provided by the “pthread_join” API.

Figure 2. 6 Body of producer thread for scenario 1

Figur 2.6 shows the body for producer thread.At the starting of the producer thread the
creat() is called. This will create the one new list and assign the pointer p to first location.
Consumer can’t get any value before the producer stores it to the list. To avoid such race condition
mutex named “lock” is used. Function named “pthread_mutex_lock” is used to take the mutex and
enter in to the critical region. After this producer thread will take a value from the user in the
variable “val”. The user entered value is stored in the list and the position of the pointer p is
updated with the new current location. The storing mechanism is provided by the function
“list_inserLast” with the argument of the list object (myList) and value to be inserted. After
successful insertion of the value in the list any of the thread can get that value. So to end up with
the critical region, to release the obtained mutex “pthread_mutex_unlock” function is used. In the
critical region if the consumer thread tries to take the mutex or tries to access the critical section it
has to wait until the producer realizes it. So, after unlocking the mutex by producer thread the
consumer thread will acquire the resources.
Figure 2.7 shows the body of the consumer thread. At the starting of the thread function it
tries to take mutex. After unlocking the mutex by the producer thread the consumer thread gets
access to the shared list. The value is displayed by the passing the list object to the function named
“list_display”. Now the consumer thread has to remove the value. To do this function

Multi core Architecture and Programming

16
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

“list_removeLast” is called with the list object. The return type of this function is location of
previous data. After removing this data mutex which is taken by the consumer thread is realized.
This whole phenomenon is shown by the code in figure 2.7.

Figure 2. 7 Body of consumer thread for scenario 1

2.4 Test cases and Testing results for scenario 1
2.4.1 Test cases
In this section test cases are designed for the producer consumer system. The table below
describes the test cases which are to be performed. There are to validate the functionality of system
with corner cases of input.
Table 2. 1 Test cases for single producer single consumer

TCN

Test cases

Test

Expected Result

Data

Output
Obtained

TC_1

Producer thread
will insert value

Int

Consumer should read the value inserted by
the producer

Yes

TC_2

Only after
producer thread
unlocks
resources
consumer
should acquire it
Main thread
should wait until
all the child
threads are exits

Any

The proper synchronization should be
maintain by producer and consumer threads

Yes

Any

The main thread has to alive until all the
threads executes completely.

Yes

Any kind of
dead lock
should not
occurs

Any

All the function of the program should
execute due to resource locking it should not
create dead lock

Yes

TC_3

TC_4

Multi core Architecture and Programming

17
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

TC_5

After reading
the data
consumer thread
should delete it

Any

After reading the data which entered by the
producer, consumer thread has to delete it
properly

Yes

2.4.2 Testing results
Figure 2.8 shows the testing results of TC_1,TC_2 and TC_4.Here the server thread is
waiting for the value from user domain. The server thread will holds critical region until it stores he
value in the shared list. By this time client thread is waiting to acquire resources.

Figure 2. 8 Producer thread is waiting for value in critical region

. Figure 2.9 shows the results of TC_3 and TC_4.By the time producer thread is leave the
critical region the consumer thread will enter in the critical region to read the value entered by the
producer.Here after reading the value the consumer thread has to delete it .This is shown in figure

Figure 2. 9 Consumer thread is printing the value from inserted by producer thread

Multi core Architecture and Programming

18
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

2.5 Sequence diagram
Figure 2.10 shows the sequence diagram for the three producers and one consumer .In
figure, the y-axis represents the time and x-axis represents the resources. In left side of the figure
Consumer
thread is in
wait state
due to
resource is
acquired
by
producer
threads

(on y axis) three producer threads and right side one consumer thread is shown.
Producer
thread 1
is in
critical
region

Producer
thread 2
is in
critical
region

Producer
thread 3
is in
critical
region

Trying
Obtain
mutext
but fails

Figure 2. 10 Sequence diagram of three producer one consumer

At the starting, every produce has to write the data in the linked list. But linked list is shared
between producer and consumer thread. To provide synchronization between producer and
consumer thread mutext is used. So, every producer locks the mutext and write data to the list. By
the same time if the consumer tries to read the data then it tries to acquire mutext which is taken by
producer and it fails. In this case the consumer thread has to wait until the producer releases the
mutext. This phenomena shown in Figure 2.10. In this application consumer can’t read the data
until the producer produces it and store it in linked list. This is synchronization mechanism is
achieved by using mutext.

Multi core Architecture and Programming

19
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

2.6 Development of paralleled program using pthread/openMP
There are two approaches to develop the threading programs in the Linux. One is using
pthread APIs and the other one is by using openMP APIs. As this part of this scenario pthread APIs
are chosen to develop the three producers and single consumer thread. Definition procedures for
both scenarios are same the only difference is in the main body of application code.

Figure 2. 11 Including Libraries and files for scenario 2

Figure 2.11 shows the all preprocessor statements of the code segment. The first one is
“ll2.c” file which is having definition of all the functions related to the linked list operation. The
second one is “ pthread.h”, which is having definition of all the threading related APIs. The last two
are the standard library files for normal functions. As the last lines of the image, a function named
create is declared.

Figure 2. 12 Deceleration of mutext and structures for scenario 2

To obtain synchronization in the application mutext is used. Here “lock” is defined as the
pthread mutext object. It is essential to initialize the mutext before using it. Here initialization is
handle by the assignment operation of the macro named “PTHREAD_MUTEX_INITIALIZER”.
Structure to pointer *myList is created which is holding starting address of the list. Structure to
pointer *p is also created which is pointed the current position for accessing the value. These all
declaration is shown by the Figure 2.12.

Figure 2. 13 Function to create new list for scenario 2

Figure 2.13 shows the definition of the creat function. By the calling of this function, it will
create new list. The new list is pointed by the myList pointer. list_crate() is the function which
created the new list and returns addresses of the new list in the form of list_head structure. At the

Multi core Architecture and Programming

20
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

second line of the pointer p is created to hold the current position of the element in the list.At the
initial level the current position is set to first position. By calling the list_position_creat().
Figure 2.14 shows the main function for multiple producer and single consumer application.
In figure ,two function is declared with the void pointer argument and void pointer return type. The
function named “ser” is the function which is called by the producer thread. Here there are three
producer threads are available which will call the same function thrice. The other side consumer
thread will call function “cli”. In the main One void pointer named “exit” is defined to obtain the
return value from the thread function.

Figure 2. 14 Main function for application of scenario 2

Here, five thread object is defined named “t_ser”, “t_ser1”, “t_ser2”, “t_ser3” and “t_cli”.
On the successful creation of the producer thread, ID of the thread is stored in the defined thread
object for producer and ID of consumer thread is stored in “t_cli”. Before creating thread here
creat() function is called to generate list and assign current location to the pointer p. In the case of
single producer and single consumer this function is called in the producer thread function because
the both threads run only once in that case. In this case producer function will execute thrice so
every time no need of creating new list. Once list is created, all the threads have to insert the value
and append the location pointer. To create the thread “pthread_create” API is used with the
appropriate arguments. In this application four threads are created in which three producer threads
and one consumer thread. The consumer thread dies automatically if the main thread exits. To avoid
this situation main thread has to wait until the consumer thread exits successfully. This mechanism
is provided by the “pthread_join” API.
Consumer can’t get any value before the producer stores it to the list. Even consumer has to
wait until all the producer stores the value in the list. The other side, no other producer thread can

Multi core Architecture and Programming

21
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

insert value if one producer thread is in critical region. To achieve such synchronization, mutex
named “lock” is used. Function named “pthread_mutex_lock” is used to take the mutex and enter in
to the critical region. After these producers thread will take a value from the user in the local
variable “val”. The user entered value is stored in the list and the position of the pointer p is
updated by every producer thread. The storing mechanism is provided by the function
“list_inserLast” with the argument of the list object (myList) and value to be inserted from last.
After successful insertion of the value by all the producer thread any of the thread (consumer) can
get that value. So to end up with the critical region, to release the obtained mutex
“pthread_mutex_unlock” function is used. The body of the producer threads is shown by Figure
2.15.

Figure 2. 15 Body of producer thread for scenario 2

Figure 2. 16 Body of consumer thread for scenario 2

Multi core Architecture and Programming

22
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

Figure 2.16 shows the body of the consumer thread. At the starting of the thread function it
tries to take mutex. After unlocking the mutex by the producer thread the consumer thread gets
access to the shared list. The value is displayed by the passing the list object to the function named
“list_display”. Now the consumer thread has to remove the value. To do this function
“list_removeLast” is called to remove single value from the list. In this scenario there are three
values are available in the list. So reading and deletion procedure is repeated thrice. The return type
of this function is location of previous data. After removing this all data mutex which is taken by
the consumer thread is realized.

2.4 Test cases and Testing results for scenario 2
2.4.1 Test cases
In this section test cases are designed for the four producer one consumer system. The table
below describes the test cases which are to be performed. There are to validate the functionality of
system with corner cases of input.
Table 2. 2 Test cases for single producer single consumer

TCN

Test cases

Test

Expected Result

Data

TC_1

TC_2

TC_3

All producer
threads should
insert value in
List
Consumer
thread should
reads the value
in appropriate
priority
Main thread
should wait until
all the child
threads are exits

TC_4 Two producer
threads should
not insert value
by same time

Output
Obtained

Int

Consumer should read the value inserted
by the producer threads

Yes

Any

Priority is assigned to the all the producer
threads. Consumer should read it in the
proper priority order

Yes

Any

The main thread has to alive until all the
threads executes completely.

Yes

Any

The proper synchronization mechanism
should be maintained by the app producer
threads to insert value

Yes

Multi core Architecture and Programming

23
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

TC_5

After reading
the data
consumer thread
should delete it
one by one

Any

After reading the data which entered by
the producer, consumer thread has to
delete it properly

Yes

2.4.2 Testing results
Figure 2.17 shows the testing results of developed producer-consumer application. Here
consumer thread is waiting until all the consumer threads leave the critical region. Here first
priority is assigned to the first producer thread.Results of TC_1,TC_2 and TC_4 is shown in the
figure below.

Figure 2. 17 Producer thread is waiting in critical region

Figure 2.18 shows the results of test cases TC_3 and TC_4 .Only after all server threads
leave the critical region consumer can enter in it to read the values from the list. As par the given
priority the consumer thread reads the value.

Figure 2. 18 Consumer threads is active after all the producer thread finish the critical region

NOTE: In this document all the results are documented for single iteration of application to
provide clear understanding.

Multi core Architecture and Programming

24
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

2.5 Conclusion
Multi-core hardware is clearly increasing software complexity by driving the need for multithreaded applications. Based on the rising rate of multi-core hardware adoption in both enterprise
and consumer devices, the challenge of creating multi-threaded applications is here to stay for
software developers. In the coming years, multi-threaded application development will most likely
become the dominant paradigm in software As this shift continues, many development
organizations will transition to multi-threaded application development on the fly.
In view of this, a producer-consumer application is successfully created using pthread APIs.
Both threads shares same linked list. The synchronization is provided by mutex. The test cases are
developed by critically analyzing the application code and assignment requirements. All the test
cases are successfully tested.

Multi core Architecture and Programming

25
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

CHAPTER 3
Development of Consumer Producer Application with extended
priority concept
3.1 Introduction
All modern operating systems divide CPU cycles in the form of time quantas among various
processes and threads (or Linux tasks) in accordance with their policies and priorities. Thread
scheduling is one of the most important and fundamental services offered by an operating system
kernel. Some of the metrics an operating system scheduler seeks to optimize are: fairness,
throughput, turnaround time, response time and efficiency .Multiprocessor operating systems
assume that all cores are identical and offer the same performance.

3.2 Sequence diagram
Figure 3.1 shows the sequence diagram in which the message queue is prioritizes. In figure, the
y-axis represents the time and x-axis represents the resources. Producer thread is shown by the left
side in the image. In producer thread, vertical thin line shows the main thread and thick overlapped
line shows the producer threads. Each producer threads maintain one queue to store data. On the
right side of the image one consumer thread is shown. Before spawning the producer and consumer
thread main thread locks four semaphores. After locking semaphores main will create four
producers and one consumer. At the starting, consumer threads will tries to acquire semaphore in
proper priority. All the producer thread will access their own message queue and insert the data. At
the end, the producer thread will unlock the semaphore so consumer thread can have access to
particular semaphore.
In figure the ascending priority order for producer threads/queues are thread4, thread3, thread2,
thread1.By the end of thread 1 it will release the semaphore1.The consumer thread continuously
seeking for the size of the all the lists associated with the queues. But due to the priority assigned to
the thread 3 is higher so consumer thread looks to the size of the third queue first. If thread three
doesn’t contain any data in its queue then consumer thread will look for the lower priority queue.
As result of this mechanism, by the time only thread one enters the element in the queue and
releases the semaphore. By the next moment consumer thread will look for availability of the
element in the queue three but it fails. No other higher priority has data in its queue then rather than
waiting for the higher priority thread, consumer thread will read and delete the data associated with
the lower priority queue. By the time higher priority producer threads enters the value in its
queue,the consumer thread will immediately reads it and delete it.

Multi core Architecture and Programming

26
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

In some condition when consumer and producer thread tries to acquire resource at the same
time the consumer thread is given the priority to access the resources.

Each producer
is storing data
in their queue
and unlocking
semaphore

Before spawning
consumer
/producer main task
locks 4 semaphores

Consumer thread
has to wait until
highest priority
producer releases
the semaphore

Main task

Producer
thread
with
priority

3.2

Consu
mer
thread
got
Figure3. 1 Sequence diagram for prioritized consumer thread
highest
priority
Development of designed application
semaph
There are two approaches to develop the threading programs in the Linux. One is using ore

pthread APIs and the other one is by using openMP APIs. As this part of this scenario pthread APIs
are chosen to develop the four producers and single consumer thread.

Figure3. 2 Including library files for scenario 3

Multi core Architecture and Programming

27
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

In this scenario pthreas API are used. Definition of Pthread APIs are included by including
pthread.h. To provide appropriate synchronization semaphore is used. Definition of semaphore
APIs and decelatarion of the semaphore type object is included with the semaphore.h. Figure 3.2
shows this files is included with the application.

Figure3. 3 Declaration of constructive functions for scenario 3

Figure 3.3 shows the deceleration of constructive function. Here in this scenario four thread
will create four different list. To fulfill this requirement each function for each thread is decelared.

Figure3. 4 Decelaration of list pointers and location pointers

Structure to pointer *myList is created which is holding starting address of the list.Here we
have four different queue. To hold these queues(hold base address) four different pointer to
structure list_head is created. As same, to hold the current location in the in four different list four
ll_node is created. Deceleration of these all object are shown by the Figure3.4.

Figure3. 5 Definition of constructive functions

Figure 3.5 shows the definition of the constructive functions. By the calling of this function,
it will create new list. The new list is pointed by the myList series of pointers. list_create() is the

Multi core Architecture and Programming

28
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

function which created the new list and returns addresses of the new list in the form of list_head
structure. At the second line of the pointer p,q,r and s will be created to hold the current position of
the element in the lists. At the initial level the current position is set to first position by calling the
list_position_create().

Figure3. 6 Declaration of thread function and synchronization objects

Here, four producer thread will call the four different function. Decelerations of these
functions are shown by the figure 3.6.As the part of synchronization mechanism four semaphores
and mutext is used.The only reason for using semaphore is it can be taken by one thread and
released by the other thread but in the case of mutex it is not possible. The decelerations of these
objects are shown by the Figure 3.6.

Figure3. 7 Main function for application of scenario 2

Figure 3.7 shows the code for the main function. Semaphore is initialized at the starting of
the main function. To initialize the semaphore function sem_init is used with the three arguments.
The first argument shows the address of the sem_t (semaphore) type object. The second parameter

Multi core Architecture and Programming

29
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

shows that semaphore is shared between all the threads of process. The third parameter shows the
initial value of the semaphore. Here in our case the initial value is 1 so semaphore is binary
semaphore. After initializing all synchronization tools, threads are created by locking the
semaphore So at the end of this four producer thread and one consumer thread is created by locking
four semaphore and main thread will wait for client thread to finish execution.

Figure3. 8 First producer thread

Figure 3.8 shows the first producer thread. The first thread will enter the value in to the list
named mylist. At the end of the function thread 1 will unlock semaphore named l_th, which was
taken by the main function before crating the thread. By the same time consumer thread is also
waiting to take the highest priority thread’s semaphore again. Here highest priority thread is thread
3 and semaphore associated with it is l_th3.Mutext is used to avoid the multiple threads seeks for
data at the same time.

Figure3. 9 Second producer thread

Figure 3.9 shows the second producer thread. The second thread will enter the value in to
the list named mylist1. At the end of the function thread 2 will unlock semaphore named l_th1,

Multi core Architecture and Programming

30
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

which was taken by the main function before crating the thread. By the same time consumer thread
is also waiting to take the highest priority thread’s semaphore again. Here highest priority thread is
thread 3 and semaphore associated with it is l_th3.Mutext is used to avoid the multiple threads
seeks for data at the same time.

Figure3. 10 Third producer thread

Figure 3.10 shows the third producer thread. The third thread will enter the value in to the
list named mylist2. At the end of the function thread 3 will unlock semaphore named l_th2, which
was taken by the main function before crating the thread. By the same time consumer thread is
waiting to lock semaphore l_th3 which is already locked by main function .Mutex is used to avoid
the multiple threads seeks for data at the same time.

Figure3. 11 Fourth producer thread

Figure 3.10 shows the fourth producer thread. The fourth thread will enter the value in to the
list named mylist3. At the end of the function thread 4 will unlock semaphore named l_th3, which
was taken by the main function before crating the thread. This is the highest priority thread for

Multi core Architecture and Programming

31
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

which consumer thread is looking. By the moment when thread four will release the semaphore,
consumer thread will be active. The consumer thread will read the data from the highest priority
thread to lowest priority thread.

Figure3. 12 Consumer thread with highest priority queue

Figure 3.12 shows the top half of the consumer thread. As per the requirement, when the
producer and consumer thread comes to gather the consumer should get highest priority to access
queue. TO obtain this one instance of the structure sched_param is created. Here two APIs are used
named pthread_setschedparam() and pthread_setschedprio().The first API is used to change the
scheduling policy of the current thread. For consumer thread scheduling policy is set to
FIFO.Basically FIFO is the scheduling policy in which the thread which comes first in ready state
will get a chance to execute. No other thread can preempt the current execution of the thread. In our
case due to FIFO scheduling no other thread can preempt the consumer thread.
On the other side, the requirement is when consumer and producer come to at the same
time consumer should get the highest priority to in such situation. To fulfill this requirement here
priority of the client thread is set high and server thread is working with the normal priority. To
assign priority to thread pthread_setschedprio() is used with the argument of thread ID and value of
priority. In FIFO scheduling the 0 is least priority and 99 is the highest priority. After setting the
priority to the thread the consumer thread will continuously mask the size variable in every list

Multi core Architecture and Programming

32
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

structure of the producer threads. If producer stores some data in its list, consumer will read it and
remove it.
Here first thread 3 is having the highest priority so consumer thread checks size of queue
associated with the thread 3. If the size is not equal to zero it menace that some data is available in
the queue so it has to delete as the first priority.

Figure3. 13 Continuation of consumer thread for second priority queue

If the highest priority thread is not having data in its queue then it not worth for consumer
thread to wait until highest priority thread stores data in its queue because consumer has to serve
four producer threads. To achieve this functionality if consumer thread doesn’t get the data in the
highest priority thread (myList3) then it will jump to check the data is second priority queue. Here
the second priority is assigned to the thread 2 and queue associated with thread 2 is myList2.This
mechanism is seen form the figure 3.13. In figure consumer thread is checking the myList2 if some
data is available in myList2 then consumer will print it and delete it.

Figure3. 14 Continuation of consumer thread for third priority queue

If the first two priority threads are not having data in their queue then it not worth for
consumer thread to wait until any of two threads stores data in its queue because consumer has to
serve four producer threads. To achieve this functionality if consumer thread doesn’t get the data in
the first two priority thread (myList3 and myList2) then it will jump to check the data is third

Multi core Architecture and Programming

33
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

priority queue. Here the third priority is assigned to the thread 1 and queue associated with thread 1
is myList1.This mechanism is seen form the figure 3.14. In figure consumer thread is checking the
myList3.If some data is available in myList1 then consumer will print it and delete it.

Figure3. 15 Continuation of consumer thread for last priority queue

If the first three priority threads are not having data in their queue then it not worth for
consumer thread to wait until any of three threads stores data in its queue because consumer has to
serve four producer threads. To achieve this functionality if consumer thread doesn’t get the data in
the first three priority thread (myList3,myList2 and myList1) then it will jump to check the data is
last priority queue. Here the last priority is assigned to the thread 0 and queue associated with
thread 3 is myList3.This mechanism is seen form the figure 3.14. In figure consumer thread is
checking the myList3.If some data is available in myList3 then consumer will print it and delete it.

3.3 Test cases and testing results for scenario 3
3.3.1 Test cases
In this section test cases are designed for the four producer one consumer system. The table
below describes the test cases which are to be performed. There are to validate the functionality of
system with corner cases of input.
Table 3. 1 Test cases for higher priority consumer thread

TCN

Test cases

Test

Expected Result

Data

TC_1

Consumer
should acquire
higher priorities
and it should
run first

NA

Output
Obtained

At the starting of the program the
consumer should run and shows the list is
empty

Multi core Architecture and Programming

Yes

34
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

If the consumer
and producer
runs to tries to
access resource
together then
consumer
should get
access first
Consumer
should not wait
for the higher
priority thread
to enter value

Any

By the time of access of shared resources,
the consumer should get higher priority

Yes

Any

If higher priority thread doesn’t enters the
value then consumer should check in the
other lower priority queue

Yes

TC_4 If two values is
enters by the
any of producer
thread then
consumer
respond it

Any

There are some cases when consumer is
busy in printing some values and by the
same time some thread enter two values in
tits queue. In such condition consumer
should Read and delete both values

Yes

TC_2

TC_3

3.3.2 Documentation of the results
Figure 3.16 shows the results of the test cases which is developed in above section. It can be
seen from the image that consumer thread is resounding for all the present threads queues which is
having values. Attached callouts will give the batter understanding about resuts.

Consumer thread
is executing first as
it has highest
priority

Thread 3 is having higher
priority but it doesn’t have value
in its queue. But present thread 1
has a value in its queue. Rather
than waiting consumer thread
giving service to the lower
priority thread.

Figure3. 16 Results of test cases

Multi core Architecture and Programming

Thread 3 has a
highest priority
to be read. But it
is coming but
consumer is not
waiting for
thread three.

35
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

3.5 Conclusion
Consumer Producer Application with extended priority concept is explained with a
sequence diagram showing the relation between FOUR static priority producer threads, one
consumer thread, four queues and their synchronization mechanism, parallelized program using P
thread and test cases to test the functionality.

Multi core Architecture and Programming

36
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

CHAPTER 4
4.1 Module Learning Outcomes
This module helped to expertise on parallel programming for multi-core architectures,
learned multi-core processors along with their performance quantification & usability techniques,
single & multi-core optimization techniques and development of multi-threaded parallel
programming are explained clearly. Virtualization and partitioning techniques are explained
detailed along with specific challenges and solutions. Got an idea about parallel programming of
multi-core processors with appropriate case studies using OpenMP and pthreads.
After this module I am able to analyze multi-core architectures, optimization process of
single- & multi-core processors and parallel programming for multi-core processors proficiently
using OpenMP library and GCC complier for programming multi-core processors and applying
parallel programming concepts for developing applications on multi-core processors was well
taught using lab programs has become a efficient way of learning pthreads, OpenMP and various
synchronization techniques for eliminating deadlock situation.

4.2 Conclusion
In Chapter1 By critically comparing throughput and latency for available arbitration
schemes of multi core, Round Robin is better when compare to the other arbitration, since it is a
fair-queuing packet scheduler which has good fairness and delay properties, and low complexity,
even there are lot more negative aspects in round robin hope replacements would be done in future.
In chapter2 Multi-core hardware is clearly increasing software complexity by driving the
need for multi-threaded applications. Based on the rising rate of multi-core hardware adoption in
both enterprise and consumer devices, the challenge of creating multi-threaded applications is here
to stay for software developers.
In view of this, a producer-consumer application is successfully created using pthread APIs.
Both threads shares same linked list. The synchronization is provided by mutex. The test cases are
developed by critically analyzing the application code and assignment requirements. All the test
cases are successfully tested.
In chapter3 Consumer Producer Application with extended priority concept is explained
with a sequence diagram showing the relation between FOUR static priority producer threads, one
consumer thread, four queues and their synchronization mechanism, parallelized program using P
thread and test cases to test the functionality.

Multi core Architecture and Programming

37
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

References
[1] http://www.coverity.com/library/pdf/coverity_multi-threaded_whitepaper.pdf
[2] www.irit.fr/publis/TRACES/12619_etfa2011.pd
[3] www.cs.fsu.edu/research/reports/TR-100401.pd
[4] paper.ijcsns.org/07_book/200809/20080936.pdf
[5] www.sti.uniurb.it/bogliolo/e-publ/KLUWERjdaes03.pdf

Multi core Architecture and Programming

38
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

Appendix-1

Multi core Architecture and Programming

39
MSRSAS - Postgraduate Engineering and Management Programme - PEMP

Appendix-2

Multi core Architecture and Programming

40

Contenu connexe

Tendances

Computer hardware servicing ncii
Computer hardware servicing nciiComputer hardware servicing ncii
Computer hardware servicing ncii
Nathan Bud
 
CBC Automotive Servicing NC II
CBC Automotive Servicing NC IICBC Automotive Servicing NC II
CBC Automotive Servicing NC II
Christopher Birung
 
Executive Summary and Design Document
Executive Summary and Design DocumentExecutive Summary and Design Document
Executive Summary and Design Document
Theresa Cline
 

Tendances (20)

FYP 2 REPORT AMIRUL ARIFF
FYP 2 REPORT AMIRUL ARIFFFYP 2 REPORT AMIRUL ARIFF
FYP 2 REPORT AMIRUL ARIFF
 
Assignment 5
Assignment 5Assignment 5
Assignment 5
 
K-12 Teacher's Guide on Computer Hardware Servicing
K-12 Teacher's Guide on Computer Hardware ServicingK-12 Teacher's Guide on Computer Hardware Servicing
K-12 Teacher's Guide on Computer Hardware Servicing
 
Parking allocation system
Parking allocation systemParking allocation system
Parking allocation system
 
.Net cbc
.Net cbc.Net cbc
.Net cbc
 
Computer Hardware-servicing-learning-module
Computer Hardware-servicing-learning-moduleComputer Hardware-servicing-learning-module
Computer Hardware-servicing-learning-module
 
Activity #3 pacifico
Activity #3 pacificoActivity #3 pacifico
Activity #3 pacifico
 
Computer hardware servicing ncii
Computer hardware servicing nciiComputer hardware servicing ncii
Computer hardware servicing ncii
 
CBC Automotive Servicing NC II
CBC Automotive Servicing NC IICBC Automotive Servicing NC II
CBC Automotive Servicing NC II
 
Final project report format
Final project report formatFinal project report format
Final project report format
 
Fls presentation
Fls presentationFls presentation
Fls presentation
 
TEACHING METHODOLOGY 1
TEACHING METHODOLOGY 1TEACHING METHODOLOGY 1
TEACHING METHODOLOGY 1
 
Performing mensuration-and-calculations-common
Performing mensuration-and-calculations-commonPerforming mensuration-and-calculations-common
Performing mensuration-and-calculations-common
 
TR AUTOMOTIVE SERVICING NC II
TR AUTOMOTIVE SERVICING NC IITR AUTOMOTIVE SERVICING NC II
TR AUTOMOTIVE SERVICING NC II
 
Executive Summary and Design Document
Executive Summary and Design DocumentExecutive Summary and Design Document
Executive Summary and Design Document
 
Plc 2 12 ed
Plc 2   12 edPlc 2   12 ed
Plc 2 12 ed
 
M.Tech : Advanced DBMS Assignment I
M.Tech : Advanced DBMS Assignment IM.Tech : Advanced DBMS Assignment I
M.Tech : Advanced DBMS Assignment I
 
C chs tg-module1-4_dec
C chs tg-module1-4_decC chs tg-module1-4_dec
C chs tg-module1-4_dec
 
K to 12 computer hardware servicing NCII
K to 12 computer hardware servicing NCIIK to 12 computer hardware servicing NCII
K to 12 computer hardware servicing NCII
 
pc operation
pc operationpc operation
pc operation
 

En vedette

Ls 5 f12orientation
Ls 5 f12orientationLs 5 f12orientation
Ls 5 f12orientation
Jim Walker
 
Assignment 4 - Certification in Dispute Management
Assignment 4 - Certification in Dispute ManagementAssignment 4 - Certification in Dispute Management
Assignment 4 - Certification in Dispute Management
Jyotpreet Kaur
 
Interview met een selectie van krottegemnaren
Interview met een selectie van krottegemnarenInterview met een selectie van krottegemnaren
Interview met een selectie van krottegemnaren
broederschoolkrottegem
 
Ayoade,j.o. introdução à climatologia para os trópicos cópia
Ayoade,j.o. introdução à climatologia para os trópicos   cópiaAyoade,j.o. introdução à climatologia para os trópicos   cópia
Ayoade,j.o. introdução à climatologia para os trópicos cópia
LCGRH UFC
 
Edwards_Assignment 4_Arbitration
Edwards_Assignment 4_ArbitrationEdwards_Assignment 4_Arbitration
Edwards_Assignment 4_Arbitration
Adam Edwards
 

En vedette (20)

Ls 5 f12orientation
Ls 5 f12orientationLs 5 f12orientation
Ls 5 f12orientation
 
Assignment 4 - Certification in Dispute Management
Assignment 4 - Certification in Dispute ManagementAssignment 4 - Certification in Dispute Management
Assignment 4 - Certification in Dispute Management
 
Suit from ixmation against Switch Lighting, October 23, 2014
Suit from ixmation against Switch Lighting, October 23, 2014Suit from ixmation against Switch Lighting, October 23, 2014
Suit from ixmation against Switch Lighting, October 23, 2014
 
Chapter 7
Chapter 7Chapter 7
Chapter 7
 
Presentatie krottegem
Presentatie krottegemPresentatie krottegem
Presentatie krottegem
 
Assignment 1
Assignment 1Assignment 1
Assignment 1
 
Assignment 3
Assignment 3Assignment 3
Assignment 3
 
nerve cells
nerve cellsnerve cells
nerve cells
 
Interview met een selectie van krottegemnaren
Interview met een selectie van krottegemnarenInterview met een selectie van krottegemnaren
Interview met een selectie van krottegemnaren
 
Deliver Value:Lean-Kanban for Portfolio Prioritization
Deliver Value:Lean-Kanban for Portfolio PrioritizationDeliver Value:Lean-Kanban for Portfolio Prioritization
Deliver Value:Lean-Kanban for Portfolio Prioritization
 
Assignment 7
Assignment 7Assignment 7
Assignment 7
 
The Conflict Paradox
The Conflict ParadoxThe Conflict Paradox
The Conflict Paradox
 
Facilitating Meetings -The Forgotten Skill in the Software World
Facilitating Meetings -The Forgotten Skill in the Software WorldFacilitating Meetings -The Forgotten Skill in the Software World
Facilitating Meetings -The Forgotten Skill in the Software World
 
Collaboration Through Gamification
Collaboration Through GamificationCollaboration Through Gamification
Collaboration Through Gamification
 
Economische analyse
Economische analyseEconomische analyse
Economische analyse
 
The Lean Startup Game
The Lean Startup GameThe Lean Startup Game
The Lean Startup Game
 
Nutrent efficiency
Nutrent efficiencyNutrent efficiency
Nutrent efficiency
 
Ayoade,j.o. introdução à climatologia para os trópicos cópia
Ayoade,j.o. introdução à climatologia para os trópicos   cópiaAyoade,j.o. introdução à climatologia para os trópicos   cópia
Ayoade,j.o. introdução à climatologia para os trópicos cópia
 
Grievance
GrievanceGrievance
Grievance
 
Edwards_Assignment 4_Arbitration
Edwards_Assignment 4_ArbitrationEdwards_Assignment 4_Arbitration
Edwards_Assignment 4_Arbitration
 

Similaire à Assignment 9

This document is for Coventry University students for their ow.docx
This document is for Coventry University students for their ow.docxThis document is for Coventry University students for their ow.docx
This document is for Coventry University students for their ow.docx
jwilliam16
 
EG87 MSc Motorsport Engineering programme specification
EG87 MSc Motorsport Engineering programme specificationEG87 MSc Motorsport Engineering programme specification
EG87 MSc Motorsport Engineering programme specification
Dhanaprasanth K S
 
4.74 s.e. computer engineering (1)
4.74 s.e. computer engineering (1)4.74 s.e. computer engineering (1)
4.74 s.e. computer engineering (1)
Aditya66086
 
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdfe3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
SILVIUSyt
 
B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)
Steve Feldman
 
B2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingB2 2006 sizing_benchmarking
B2 2006 sizing_benchmarking
Steve Feldman
 
B. Tech. Course Specification ME
B. Tech. Course Specification MEB. Tech. Course Specification ME
B. Tech. Course Specification ME
smb2015
 
22nd August Final - COA Handout Microprocessor.docx
22nd August Final - COA Handout Microprocessor.docx22nd August Final - COA Handout Microprocessor.docx
22nd August Final - COA Handout Microprocessor.docx
SZahidNabiDar
 

Similaire à Assignment 9 (20)

This document is for Coventry University students for their ow.docx
This document is for Coventry University students for their ow.docxThis document is for Coventry University students for their ow.docx
This document is for Coventry University students for their ow.docx
 
Standard dme sop
Standard dme sopStandard dme sop
Standard dme sop
 
Leave management System
Leave management SystemLeave management System
Leave management System
 
Syllabus for fourth year of engineering
Syllabus for fourth year of engineeringSyllabus for fourth year of engineering
Syllabus for fourth year of engineering
 
PetroSync - ASME B31.3 Process Piping Code Design Requirements
PetroSync - ASME B31.3 Process Piping Code Design RequirementsPetroSync - ASME B31.3 Process Piping Code Design Requirements
PetroSync - ASME B31.3 Process Piping Code Design Requirements
 
EG87 MSc Motorsport Engineering programme specification
EG87 MSc Motorsport Engineering programme specificationEG87 MSc Motorsport Engineering programme specification
EG87 MSc Motorsport Engineering programme specification
 
IRJET- Course outcome Attainment Estimation System
IRJET-  	  Course outcome Attainment Estimation SystemIRJET-  	  Course outcome Attainment Estimation System
IRJET- Course outcome Attainment Estimation System
 
4.74 s.e. computer engineering (1)
4.74 s.e. computer engineering (1)4.74 s.e. computer engineering (1)
4.74 s.e. computer engineering (1)
 
Online course reservation system
Online course reservation systemOnline course reservation system
Online course reservation system
 
Student portal system application -Project Book
Student portal system application -Project BookStudent portal system application -Project Book
Student portal system application -Project Book
 
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdfe3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
 
216328327 nilesh-and-teams-project
216328327 nilesh-and-teams-project216328327 nilesh-and-teams-project
216328327 nilesh-and-teams-project
 
B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)
 
B2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingB2 2006 sizing_benchmarking
B2 2006 sizing_benchmarking
 
ireb_cpre_syllabus_requirements-modeling_advanced_level_en_v2.2.pdf
ireb_cpre_syllabus_requirements-modeling_advanced_level_en_v2.2.pdfireb_cpre_syllabus_requirements-modeling_advanced_level_en_v2.2.pdf
ireb_cpre_syllabus_requirements-modeling_advanced_level_en_v2.2.pdf
 
A Survey on Design of Online Judge System
A Survey on Design of Online Judge SystemA Survey on Design of Online Judge System
A Survey on Design of Online Judge System
 
B. Tech. Course Specification ME
B. Tech. Course Specification MEB. Tech. Course Specification ME
B. Tech. Course Specification ME
 
B. Tech. Course Specification ME
B. Tech. Course Specification MEB. Tech. Course Specification ME
B. Tech. Course Specification ME
 
22nd August Final - COA Handout Microprocessor.docx
22nd August Final - COA Handout Microprocessor.docx22nd August Final - COA Handout Microprocessor.docx
22nd August Final - COA Handout Microprocessor.docx
 
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
 An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Assignment 9

  • 1. ASSIGNMENT Module Code Module Name Course Department ESD 532 Multi core Architecture and Programming M.Sc. [Engg.] in Real Time Embedded Systems Computer Engineering Name of the Student Bhargav Shah Reg. No CHB0910001 Batch Full-Time 2011. Module Leader Padma Priya Dharishini P. M.S.Ramaiah School of Advanced Studies Postgraduate Engineering and Management Programmes(PEMP) #470-P Peenya Industrial Area, 4th Phase, Peenya, Bengaluru-560 058 Tel; 080 4906 5555, website: www.msrsas.org POSTGRADUATE ENGINEERING AND MANAGEMENT PROGRAMME – (PEMP) MSRSAS - Postgraduate Engineering and Management Programme - PEMP i
  • 2. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Declaration Sheet Student Name Bhargav Shah Reg. No CHB0910001 Course RTES Batch Full-Time 2011 Module Code ESD 532 Module Title Module Date Multi Core Architecture and Programming to 06-02-2012 03-03-2012 Module Leader Padma Priya Dharishini P. Batch Full-Time 2011. Extension requests: Extensions can only be granted by the Head of the Department in consultation with the module leader. Extensions granted by any other person will not be accepted and hence the assignment will incur a penalty. Extensions MUST be requested by using the ‘Extension Request Form’, which is available with the ARO. A copy of the extension approval must be attached to the assignment submitted. Penalty for late submission Unless you have submitted proof of mitigating circumstances or have been granted an extension, the penalties for a late submission of an assignment shall be as follows: • Up to one week late: Penalty of 5 marks • One-Two weeks late: Penalty of 10 marks • More than Two weeks late: Fail - 0% recorded (F) All late assignments: must be submitted to Academic Records Office (ARO). It is your responsibility to ensure that the receipt of a late assignment is recorded in the ARO. If an extension was agreed, the authorization should be submitted to ARO during the submission of assignment. To ensure assignment reports are written concisely, the length should be restricted to a limit indicated in the assignment problem statement. Assignment reports greater than this length may incur a penalty of one grade (5 marks). Each delegate is required to retain a copy of the assignment report. Declaration The assignment submitted herewith is a result of my own investigations and that I have conformed to the guidelines against plagiarism as laid out in the PEMP Student Handbook. All sections of the text and results, which have been obtained from other sources, are fully referenced. I understand that cheating and plagiarism constitute a breach of University regulations and will be dealt with accordingly. Signature of the student Date Submission date stamp (by ARO) Signature of the Module Leader and date Multi core Architecture and Programming Signature of Head of the Department and date ii
  • 3. MSRSAS - Postgraduate Engineering and Management Programme - PEMP M. S. Ramaiah School of Advanced Studies Postgraduate Engineering and Management Programme- Coventry University (UK) Assessment Sheet Department Computer Engineering Course RTES Module Code ESD 532 Module Leader Padma Priya Dharishini P. Module Completion Date 03-03-2012 Student Name Bhargav Shah ID Number CHB0910001 Attendance Details Batch Module Title Theory Full-Time 2011 Multi core Architecture and Programming Laboratory Fine Paid Remarks (if any for shortage of attendance) Q. No a Written Examination – Marks – Sheet (Assessor to Fill) C d Total Remarks b 1 2 3 4 5 6 Marks Scored for 100 Part a Marks Scored out of 50 Result PASS Assignment – Marks-Sheet (Assessor to Fill) C d Total b FAIL Remarks A B C Marks Scored for 100 Result Marks Scored out of 50 PASS FAIL PMAR- form completed for student feedback (Assessor has to mark) Yes Overall-Result Components Assessor Reviewer No Written Examination (Max 50) Pass / Fail Assignment (Max 50) Pass / Fail Total Marks (Max 100) (Before Late Penalty) Grade Total Marks (Max 100) (After Late Penalty) Grade IMPORTANT 1. The assignment and examination marks have to be rounded off to the nearest integer and entered in the respective fields 2. A minimum of 40% required for a pass in both assignment and written test individually 3. A student cannot fail on application of late penalty (i.e. on application of late penalty if the marks are below 40, cap at 40 marks) Signature of Reviewer with date Multi core Architecture and Programming Signature of Module Leader with date iii
  • 4. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Abstract Multi-core processors may provide higher performance to current embedded processors to support future embedded systems functionalities. According to the Industrial Advisor Board, embedded systems will benefit from multi-core processors, as these systems are comprised by mixed applications, i.e. applications with hard real-time constrains and without real-time constrains, that can be executed into the same processor. Moreover, the Industrial Advisor Board also stated that memory operations represent one of the main bottlenecks that current embedded applications must face, being even more important than the performance of the core that can suffer a degradation of 10-20% without really affecting overall performance. We take profit of this fact by studying the effect of running several threads per core, that is, we make the core multithreaded. And we also studied the effect of caches, which are a well known technique in high performance computing to reduce the memory bottleneck. Chapter 1 discuss on Arbitration schemes of Memory Access in Multicore Systems , what are the types of arbitration schemes existed up to now which is the best one of them, what are the challenging factors for these arbitration schemes in the present situation and finally short note on the factors that support the proposed arbitration schemes. Chapter 2 discuss about a multi-threaded concept of consumer and producer threads how are going to share a common queue, how to prioritize the threads if we are sharing a common thread and some test cases to test the scenarios. Chapter 3 discuss about a different situation having 4 of producers with different queues and single consumers and it will discuss about the changing priority levels of the consumer so that in the conflicting condition with the consumer thread the producer will get the high priority to execute. Multi core Architecture and Programming iv
  • 5. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Contents Declaration Sheet .................................................................................................................................ii Abstract .............................................................................................................................................. iv List of Figures ...................................................................................................................................vii Symbols .............................................................................................................................................vii Nomenclature...................................................................................................................................viii CHAPTER 1 ....................................................................................................................................... 9 Arbitration schemes of memory access in multi core ..................................................................... 9 1.1 Introduction ...........................................................................................................................9 1.2 Types of arbitration schemes .................................................................................................9 1.3 Challenges in arbitration schemes .......................................................................................10 1.4 Impact of the arbitration schemes on throughput and latency .................................................11 1.5 Proposal of better arbitration scheme with justification ..........................................................11 1.6 Conclusion ...............................................................................................................................12 CHAPTER 2 ..................................................................................................................................... 13 Development of Consumer Producer Application ........................................................................ 13 2.1 Introduction ..............................................................................................................................13 2.2 Sequence diagram ....................................................................................................................13 2.3 Development of parallelized program using Pthread/openMP ................................................14 2.4 Test cases and Testing results for scenario 1 ...........................................................................17 2.4.1 Test cases ........................................................................................................................................ 17 2.4.2 Testing results................................................................................................................................. 18 2.5 Sequence diagram.............................................................................................................................. 19 2.6 Development of paralleled program using pthread/openMP ...................................................20 2.4 Test cases and Testing results for scenario 2 ...........................................................................23 2.4.1 Test cases ........................................................................................................................................ 23 2.4.2 Testing results................................................................................................................................. 24 2.5 Conclusion ...............................................................................................................................25 CHAPTER 3 ..................................................................................................................................... 26 Development of Consumer Producer Application with extended priority concept ................... 26 3.1 Introduction ..............................................................................................................................26 3.2 Sequence diagram................................................................................................................26 3.2 Development of designed application ......................................................................................27 3.3 Test cases and testing results for scenario 3 ........................................................................34 3.3.1 Test cases........................................................................................................................................ 34 3.3.2 Documentation of the results ........................................................................................................ 35 3.5 Conclusion ...............................................................................................................................36 CHAPTER 4 ..................................................................................................................................... 37 4.1 Module Learning Outcomes.....................................................................................................37 4.2 Conclusion ...............................................................................................................................37 References ......................................................................................................................................... 38 Appendix-1 ........................................................................................................................................ 39 Appendix-2 ........................................................................................................................................ 40 Multi core Architecture and Programming v
  • 6. MSRSAS - Postgraduate Engineering and Management Programme - PEMP List of Tabl Table 2. 1 Test cases for single producer single consumer ................................................................17 Table 2. 2 Test cases for single producer single consumer ................................................................23 Table 3. 1 Test cases for higher priority consumer thread .................................................................34 Multi core Architecture and Programming vi
  • 7. MSRSAS - Postgraduate Engineering and Management Programme - PEMP List of Figures Figure 2. 1 Sequence diagram for one producer and one consumer ..................................................14 Figure 2. 2 Including Libraries and files for scenario 1 .....................................................................14 Figure 2. 3 Deceleration of mutext and structures for scenario 1 ......................................................14 Figure 2. 4 Function to create new list for scenario 1 ........................................................................15 Figure 2. 5 Main function for Application of scenario 1 ...................................................................15 Figure 2. 6 Body of producer thread for scenario 1 ...........................................................................16 Figure 2. 7 Body of consumer thread for scenario 1 ..........................................................................17 Figure 2. 8 Producer thread is waiting for value in critical region ....................................................18 Figure 2. 9 Consumer thread is printing the value from inserted by producer thread .......................18 Figure 2. 10 Sequence diagram of three producer one consumer ......................................................19 Figure 2. 11 Including Libraries and files for scenario 2 ...................................................................20 Figure 2. 12 Deceleration of mutext and structures for scenario 2 ....................................................20 Figure 2. 13 Function to create new list for scenario 2 ......................................................................20 Figure 2. 14 Main function for application of scenario 2 ..................................................................21 Figure 2. 15 Body of producer thread for scenario 2 .........................................................................22 Figure 2. 16 Body of consumer thread for scenario 2 ........................................................................22 Figure 2. 17 Producer thread is waiting in critical region ..................................................................24 Figure 2. 18 Consumer threads is active after all the producer thread finish the critical region .......24 Figure3. 1 Sequence diagram for prioritized consumer thread ..........................................................27 Figure3. 2 Including library files for scenario 3 ................................................................................27 Figure3. 3 Declaration of constructive functions for scenario 3 ........................................................28 Figure3. 4 Decelaration of list pointers and location pointers ...........................................................28 Figure3. 5 Definition of constructive functions .................................................................................28 Figure3. 6 Declaration of thread function and synchronization objects ............................................29 Figure3. 7 Main function for application of scenario 2 ....................................................................29 Figure3. 8 First producer thread .........................................................................................................30 Figure3. 9 Second producer thread ....................................................................................................30 Figure3. 10 Third producer thread .....................................................................................................31 Figure3. 11 Fourth producer thread ...................................................................................................31 Figure3. 12 Consumer thread with highest priority queue.................................................................32 Figure3. 13 Continuation of consumer thread for second priority queue ..........................................33 Figure3. 14 Continuation of consumer thread for third priority queue .............................................33 Figure3. 15 Continuation of consumer thread for last priority queue ................................................34 Figure3. 16 Results of test cases ........................................................................................................35 Multi core Architecture and Programming vii
  • 8. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Nomenclature WRR CMP SDRAM DRR SRR PD PBS TDMA CCSP Weighted Round Robin Chip Multiprocessors Synchronous Dynamic Random Access Memory Deficit Round Robin Stratified Round Robin Priority Division Priority Based Budget Scheduler Time Division Multiple Access Credit Controlled Static Priority Multi core Architecture and Programming viii
  • 9. CHAPTER 1 Arbitration schemes of memory access in multi core 1.1 Introduction The constraints of embedded systems in terms of power consumption, thermal dissipation, costefficiency and performance can be met by using multi core processors (CMP or chip multiprocessors). On typical medium size CMPs, the cores share a bus to the highest levels of the memory hierarchy. In multi-core architectures, resources are often shared to reduce cost and exchange information. An off-chip memory is one of the most common shared resources. SDRAM is a popular off-chip memory currently used in cost sensitive and performance demanding applications due to its low price, high data rate and large storage. An asynchronous refresh operation and a dependence on the previous access make SDRAM access latency vary by an order of magnitude. The main contribution of this report to critically compare throughput and latency for available arbitration schemes of multi core. At the end justification for batter arbitration scheme is derived from the analysis. 1.2 Types of arbitration schemes[1] There have been many approaches to provide fairness, high throughput and worst case latency bounds in the arbiter especially in the networks domain. Weighted Round Robin (WRR) is a work conserving arbiter where cores are allocated a number of slots within a round robin cycle depending on their bandwidth requirements. If a core does not use its slot, the next active core in the round robin cycle is immediately assigned to increase the throughput. Cores producing busty traffic benefit at the cost of cores which produce uniform traffic. Deficit Round Robin (DRR) assigns different slot sizes to each master according to its bandwidth requirements and schedules them in a Round Robin (RR) fashion. Difference between DRR and RR is that if a master cannot use its slot or part of its slot in the current cycle, the remaining slot (deficit) is added into the next cycle. In the next cycle, the master can transfer up to an amount of data equal to the sum of its slot size and the deficit. Thus, the DRR tries to avoid the unfairness caused to uniform traffic generators in the WRR. Stratified Round Robin (SRR) groups masters with alike bandwidth requirements into one Class. After grouping masters into various classes ,two step arbitration is applied: interclass and infraclass. The inter class scheduler schedules each class Fk once in 2k clock cycles. Hence, the lesser the k, the often the class is scheduled. The intra class scheduler uses WRR mechanism to
  • 10. MSRSAS - Postgraduate Engineering and Management Programme - PEMP select the next master within the class. Due to more uniform distribution of bandwidth, SRR reduces the worst case latencies compared to the WRR. However, to achieve low worst case latency for a class Fk, k must be minimized which leads to over allocation. Priority Division (PD) combines TDMA and static priorities to achieve guarantees and high resource utilization. Instead of fixing TDMA slots statically, PD fixes priorities of each master within the slot statically such that each master has at least one slot where it has the highest priority. Thus, masters have guarantees equal to TDMA and unused slots are arbitrated based on static priority to increase the resource utilization. This approach provides benefit over RR or WRR only if the response time of the shared resource is fixed. In the case of variable response time (e.g. SDRAM), this approach produces high worst case latencies. . In Priority Based Budget Scheduler (PBS), masters are assigned fixed budgets of access in a unit time (Replenishment Period). Moreover, masters are also assigned fixed priorities to resolve conflicts. Budget relates to master's bandwidth requirements while priority relates to master's latency requirements. Thus, coupling between latency and band- width is removed. The shared resource is granted to the active master with the highest priority which still has a budget left. At the beginning of a replenishment period, each master gets its original budget back. Akesson et al introduce a Credit Controlled Static Priority (CCSP) arbiter. The CCSP also uses priorities and budgets within the replenishment period. But, instead of using frame based replenishment periods, masters are replenished incrementally for fine grade bandwidth assignment. 1.3 Challenges in arbitration schemes The traditional shared bus arbitration schemes such as TMDA and round robin show the several defects such as bus starvation, and low system performance. In strict priority scheduling the higher priority packets can get the most of the bandwidth therefore the lower priority packets has to wait for longer time for the resource allocation. This will cause starvation in lower priority packets. In case of WRR and LARD regarding power consumption is that both of them always have their servers turned ON even though some of them do not serve any requests. Therefore, they cannot conserve any power; Weighted Round-Robin and Deficit Round-Robin are extensions that guarantee each requestor a minimum service, proportional to an allocated rate, in a common periodically repeating frame of fixed size. This type of frame-based rate regulation is similar to the Deferrable Server, and suffers from an inherent coupling between allocation granularity and latency, where allocation granularity is inversely proportional to the frame size. Larger frame size results in finer allocation granularity, reducing over allocation, but at the cost of increased latencies for all requestors. Another common example of frame-based scheduling disciplines is time-division Multi core Architecture and Programming 10
  • 11. MSRSAS - Postgraduate Engineering and Management Programme - PEMP multiplexing that suffers from the additional disadvantage that it requires a schedule to be stored for each configuration, which is very costly if the frame size or the number of use cases are large[2]. The above arbitration algorithms cannot handle the strict real-time requirements, so two-level arbitration algorithm, which is called the RB_Lottery bus arbitration, has been developed which will solve the impartiality, starvation and real-time problems, which exist in the Lottery method, and reduces the average latency for bus requests[5]. Within hardware verifications, the proposed arbiter processes higher operation frequency than the Lottery arbiter. Although the pays more attention on the chip area and power consumptions than the Lottery arbiter and it also has less average latency of bus requests than the Lottery arbitration. 1.4 Impact of the arbitration schemes on throughput and latency[4] Each approach to provide fairness, high throughput and worst case latency bounds, optimization of one factor degrades other factors. In Weighted Round Robin to provide low worst case latency to any core, it has to be assigned more slots in the round robin cycle which leads to over allocation. Deficit Round Robin (DRR) has very high latencies in the worst case. For example, one master stays idle for a long time and gains high deficit. Afterwards, it contentiously requests the shared resource. Since it has gained a high deficit, it will occupy the shared resource for a long time incurring very high latencies to other masters. Due to the presence of priorities, PBS is fair to high priority masters and unfair to low priority masters. When all masters are executing HRTs (as outlined in the introduction), PBS results in large WCETs for low priority masters. Credit Controlled Static Priority (CCSP) also due to the presence of the priorities, large worst case execution time bounds for lower priority masters are produced. 1.5 Proposal of better arbitration scheme with justification Stratified Round Robin is better when compare to the other arbitration, since it is a fairqueuing packet scheduler which has good fairness and delay properties, and low complexity. It is unique among all other schedulers of comparable complexity in that it provides a single packet delay bound that is independent of the number of flows. Importantly, it is also enable a simple hardware implementation, and thus fills a current gap between scheduling algorithms that have provably good performance and those that are feasible and practical to implement in high-speed routers. Interactive applications such as video and audio conferencing require the total delay experienced by a packet in the network to be bounded on an end-to-end basis. The packet scheduler decides the order in which packets are sent on the output link, and therefore determines the queuing delay experienced by a packet at each intermediate router in the network. The Low complexity with Multi core Architecture and Programming 11
  • 12. MSRSAS - Postgraduate Engineering and Management Programme - PEMP line rates increasing to 40 Gbps, it is critical that all packet processing tasks performed by routers, including output scheduling, be able to operate in nanosecond time frames. 1.6 Conclusion By critically comparing throughput and latency for available arbitration schemes of multi core, Round Robin is better when compare to the other arbitration, since it is a fair-queuing packet scheduler which has good fairness and delay properties, and low complexity, even there are lot more negative aspects in round robin hope replacements would be done in future. Multi core Architecture and Programming 12
  • 13. MSRSAS - Postgraduate Engineering and Management Programme - PEMP CHAPTER 2 Development of Consumer Producer Application 2.1 Introduction Today, the world of software development is presented with a new challenge. To fully leverage this new class of multi-core hardware, software developers must change the way they create applications. By turning their focus to multi-threaded applications, developers will be able to take full advantage of multi-core devices and deliver software that meets the demands of the world. But this paradigm of multi-threaded software development adds a new wrinkle of complexity for those who care the utmost about software quality. Concurrency defects such as race conditions and deadlocks are software defect types that are unique to multi-threaded applications. Complex and hard-to-find, these defects can quickly derail a software project. To avoid catastrophic failures in multithreaded applications, software development organizations must understand how to identify and eliminate these deadly problems early in the application development lifecycle. Here as the part of this work multi threaded producer consumer application is created using the given linked list program. Two scenarios has been accommodate in this part of document. In the first case producer will insert one value to the doubled linked list and at the other edge the consumer will read that value and delete it. As the part of the second case three producer threads tries to insert value in the linked list, at the end one consumer thread tries to read it and delete it. Proper synchronization mechanism is developed. 2.2 Sequence diagram A sequence diagram is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a Message Sequence Chart. A sequence diagram shows object interactions arranged in time sequence. It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. Sequence diagrams typically are associated with use case realizations in the Logical View of the system under development. Figure 2.1 shows the sequence diagram for the one producer and one consumer .In figure, the y-axis represents the time and x-axis represents the resources. In top left side one producer thread and top right side one consumer thread is shown. At the starting, produce has to write the data in the linked list. But linked list is shared between producer and consumer thread. To provide synchronization between producer and consumer thread mutext is used. So, producer locks the mutext and write data to the list. By the same time if the consumer tries to read the data then it tries to acquire mutext which is taken by producer and it fails. In this case the consumer thread has to Multi core Architecture and Programming 13
  • 14. MSRSAS - Postgraduate Engineering and Management Programme - PEMP wait until the producer releases the mutext.This phenomena shown in Figure 2.1. In this application consumer can’t read the data until the producer produces it and store it in linked list. This is synchronization mechanism is achieved by using mutext. Trying Obtain mutext but fails Critical region Consumer has wait until resource will be free by producer Figure 2. 1 Sequence diagram for one producer and one consumer 2.3 Development of parallelized program using Pthread/openMP There are two approaches to develop the threading programs in the Linux. One is using pthread APIs and the other one is by using openMP APIs. As this part of this scenario pthread APIs are chosen to develop the single producer and single consumer thread. Figure 2. 2 Including Libraries and files for scenario 1 Figure 2.2 shows the all preprocessor statements of the code segment. The first one is “ll2.c” file which is having definition of all the functions related to the linked list operation. The second one is “ pthread.h”, which is having definition of all the threading related APIs. The last two are the standard library files for normal functions. As the last lines of the image, a function named create is declared. Figure 2. 3 Deceleration of mutext and structures for scenario 1 Multi core Architecture and Programming 14
  • 15. MSRSAS - Postgraduate Engineering and Management Programme - PEMP To obtain synchronization in the application mutext is used. Here “lock” is defined as the pthread mutext object. It is essential to initialize the mutext before using it. Here initialization is handle by the assignment operation of the macro named “PTHREAD_MUTEX_INITIALIZER”. Structure to pointer *myList is created which is holding starting address of the list. Structure to pointer *p is also created which is pointed the current position for accessing the value. These all declaration is shown by the Figure 2.2. Figure 2. 4 Function to create new list for scenario 1 Figure 2.4 shows the definition of the creat function. By the calling of this function, it will create new list. The new list is pointed by the myList pointer. list_crate() is the function which created the new list and returns addresses of the new list in the form of list_head structure. At the second line of the pointer p is created to hold the current position of the element in the list.At the initial level the current position is set to first position. By calling the list_position_creat(). Figure 2. 5 Main function for Application of scenario 1 Figure 2.5 shows the main function for the single producer and single consumer application. In figure, two functions is declared with the void pointer argument and void pointer return type. The function named “ser” is the function which is called by the producer thread. The other side consumer thread will call function “cli”.In the main One void pointer named “exit” is defined to obtain the return value from the thread function. Here, two thread object is defined named “t_ser” and “t_cli”.On the successful creation of the producer thread, ID of the thread is stored in the Multi core Architecture and Programming 15
  • 16. MSRSAS - Postgraduate Engineering and Management Programme - PEMP “t_ser” and ID of consumer thread is stored in “t_cli”.To create the thread “pthread_create” API is used with the appropriate arguments. In this application two threads are created one is producer thread and other is the consumer thread. The consumer thread dies automatically if the producer of the consumer thread exits. To avoid this situation main thread has to wait until the consumer thread exits successfully. This mechanism is provided by the “pthread_join” API. Figure 2. 6 Body of producer thread for scenario 1 Figur 2.6 shows the body for producer thread.At the starting of the producer thread the creat() is called. This will create the one new list and assign the pointer p to first location. Consumer can’t get any value before the producer stores it to the list. To avoid such race condition mutex named “lock” is used. Function named “pthread_mutex_lock” is used to take the mutex and enter in to the critical region. After this producer thread will take a value from the user in the variable “val”. The user entered value is stored in the list and the position of the pointer p is updated with the new current location. The storing mechanism is provided by the function “list_inserLast” with the argument of the list object (myList) and value to be inserted. After successful insertion of the value in the list any of the thread can get that value. So to end up with the critical region, to release the obtained mutex “pthread_mutex_unlock” function is used. In the critical region if the consumer thread tries to take the mutex or tries to access the critical section it has to wait until the producer realizes it. So, after unlocking the mutex by producer thread the consumer thread will acquire the resources. Figure 2.7 shows the body of the consumer thread. At the starting of the thread function it tries to take mutex. After unlocking the mutex by the producer thread the consumer thread gets access to the shared list. The value is displayed by the passing the list object to the function named “list_display”. Now the consumer thread has to remove the value. To do this function Multi core Architecture and Programming 16
  • 17. MSRSAS - Postgraduate Engineering and Management Programme - PEMP “list_removeLast” is called with the list object. The return type of this function is location of previous data. After removing this data mutex which is taken by the consumer thread is realized. This whole phenomenon is shown by the code in figure 2.7. Figure 2. 7 Body of consumer thread for scenario 1 2.4 Test cases and Testing results for scenario 1 2.4.1 Test cases In this section test cases are designed for the producer consumer system. The table below describes the test cases which are to be performed. There are to validate the functionality of system with corner cases of input. Table 2. 1 Test cases for single producer single consumer TCN Test cases Test Expected Result Data Output Obtained TC_1 Producer thread will insert value Int Consumer should read the value inserted by the producer Yes TC_2 Only after producer thread unlocks resources consumer should acquire it Main thread should wait until all the child threads are exits Any The proper synchronization should be maintain by producer and consumer threads Yes Any The main thread has to alive until all the threads executes completely. Yes Any kind of dead lock should not occurs Any All the function of the program should execute due to resource locking it should not create dead lock Yes TC_3 TC_4 Multi core Architecture and Programming 17
  • 18. MSRSAS - Postgraduate Engineering and Management Programme - PEMP TC_5 After reading the data consumer thread should delete it Any After reading the data which entered by the producer, consumer thread has to delete it properly Yes 2.4.2 Testing results Figure 2.8 shows the testing results of TC_1,TC_2 and TC_4.Here the server thread is waiting for the value from user domain. The server thread will holds critical region until it stores he value in the shared list. By this time client thread is waiting to acquire resources. Figure 2. 8 Producer thread is waiting for value in critical region . Figure 2.9 shows the results of TC_3 and TC_4.By the time producer thread is leave the critical region the consumer thread will enter in the critical region to read the value entered by the producer.Here after reading the value the consumer thread has to delete it .This is shown in figure Figure 2. 9 Consumer thread is printing the value from inserted by producer thread Multi core Architecture and Programming 18
  • 19. MSRSAS - Postgraduate Engineering and Management Programme - PEMP 2.5 Sequence diagram Figure 2.10 shows the sequence diagram for the three producers and one consumer .In figure, the y-axis represents the time and x-axis represents the resources. In left side of the figure Consumer thread is in wait state due to resource is acquired by producer threads (on y axis) three producer threads and right side one consumer thread is shown. Producer thread 1 is in critical region Producer thread 2 is in critical region Producer thread 3 is in critical region Trying Obtain mutext but fails Figure 2. 10 Sequence diagram of three producer one consumer At the starting, every produce has to write the data in the linked list. But linked list is shared between producer and consumer thread. To provide synchronization between producer and consumer thread mutext is used. So, every producer locks the mutext and write data to the list. By the same time if the consumer tries to read the data then it tries to acquire mutext which is taken by producer and it fails. In this case the consumer thread has to wait until the producer releases the mutext. This phenomena shown in Figure 2.10. In this application consumer can’t read the data until the producer produces it and store it in linked list. This is synchronization mechanism is achieved by using mutext. Multi core Architecture and Programming 19
  • 20. MSRSAS - Postgraduate Engineering and Management Programme - PEMP 2.6 Development of paralleled program using pthread/openMP There are two approaches to develop the threading programs in the Linux. One is using pthread APIs and the other one is by using openMP APIs. As this part of this scenario pthread APIs are chosen to develop the three producers and single consumer thread. Definition procedures for both scenarios are same the only difference is in the main body of application code. Figure 2. 11 Including Libraries and files for scenario 2 Figure 2.11 shows the all preprocessor statements of the code segment. The first one is “ll2.c” file which is having definition of all the functions related to the linked list operation. The second one is “ pthread.h”, which is having definition of all the threading related APIs. The last two are the standard library files for normal functions. As the last lines of the image, a function named create is declared. Figure 2. 12 Deceleration of mutext and structures for scenario 2 To obtain synchronization in the application mutext is used. Here “lock” is defined as the pthread mutext object. It is essential to initialize the mutext before using it. Here initialization is handle by the assignment operation of the macro named “PTHREAD_MUTEX_INITIALIZER”. Structure to pointer *myList is created which is holding starting address of the list. Structure to pointer *p is also created which is pointed the current position for accessing the value. These all declaration is shown by the Figure 2.12. Figure 2. 13 Function to create new list for scenario 2 Figure 2.13 shows the definition of the creat function. By the calling of this function, it will create new list. The new list is pointed by the myList pointer. list_crate() is the function which created the new list and returns addresses of the new list in the form of list_head structure. At the Multi core Architecture and Programming 20
  • 21. MSRSAS - Postgraduate Engineering and Management Programme - PEMP second line of the pointer p is created to hold the current position of the element in the list.At the initial level the current position is set to first position. By calling the list_position_creat(). Figure 2.14 shows the main function for multiple producer and single consumer application. In figure ,two function is declared with the void pointer argument and void pointer return type. The function named “ser” is the function which is called by the producer thread. Here there are three producer threads are available which will call the same function thrice. The other side consumer thread will call function “cli”. In the main One void pointer named “exit” is defined to obtain the return value from the thread function. Figure 2. 14 Main function for application of scenario 2 Here, five thread object is defined named “t_ser”, “t_ser1”, “t_ser2”, “t_ser3” and “t_cli”. On the successful creation of the producer thread, ID of the thread is stored in the defined thread object for producer and ID of consumer thread is stored in “t_cli”. Before creating thread here creat() function is called to generate list and assign current location to the pointer p. In the case of single producer and single consumer this function is called in the producer thread function because the both threads run only once in that case. In this case producer function will execute thrice so every time no need of creating new list. Once list is created, all the threads have to insert the value and append the location pointer. To create the thread “pthread_create” API is used with the appropriate arguments. In this application four threads are created in which three producer threads and one consumer thread. The consumer thread dies automatically if the main thread exits. To avoid this situation main thread has to wait until the consumer thread exits successfully. This mechanism is provided by the “pthread_join” API. Consumer can’t get any value before the producer stores it to the list. Even consumer has to wait until all the producer stores the value in the list. The other side, no other producer thread can Multi core Architecture and Programming 21
  • 22. MSRSAS - Postgraduate Engineering and Management Programme - PEMP insert value if one producer thread is in critical region. To achieve such synchronization, mutex named “lock” is used. Function named “pthread_mutex_lock” is used to take the mutex and enter in to the critical region. After these producers thread will take a value from the user in the local variable “val”. The user entered value is stored in the list and the position of the pointer p is updated by every producer thread. The storing mechanism is provided by the function “list_inserLast” with the argument of the list object (myList) and value to be inserted from last. After successful insertion of the value by all the producer thread any of the thread (consumer) can get that value. So to end up with the critical region, to release the obtained mutex “pthread_mutex_unlock” function is used. The body of the producer threads is shown by Figure 2.15. Figure 2. 15 Body of producer thread for scenario 2 Figure 2. 16 Body of consumer thread for scenario 2 Multi core Architecture and Programming 22
  • 23. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Figure 2.16 shows the body of the consumer thread. At the starting of the thread function it tries to take mutex. After unlocking the mutex by the producer thread the consumer thread gets access to the shared list. The value is displayed by the passing the list object to the function named “list_display”. Now the consumer thread has to remove the value. To do this function “list_removeLast” is called to remove single value from the list. In this scenario there are three values are available in the list. So reading and deletion procedure is repeated thrice. The return type of this function is location of previous data. After removing this all data mutex which is taken by the consumer thread is realized. 2.4 Test cases and Testing results for scenario 2 2.4.1 Test cases In this section test cases are designed for the four producer one consumer system. The table below describes the test cases which are to be performed. There are to validate the functionality of system with corner cases of input. Table 2. 2 Test cases for single producer single consumer TCN Test cases Test Expected Result Data TC_1 TC_2 TC_3 All producer threads should insert value in List Consumer thread should reads the value in appropriate priority Main thread should wait until all the child threads are exits TC_4 Two producer threads should not insert value by same time Output Obtained Int Consumer should read the value inserted by the producer threads Yes Any Priority is assigned to the all the producer threads. Consumer should read it in the proper priority order Yes Any The main thread has to alive until all the threads executes completely. Yes Any The proper synchronization mechanism should be maintained by the app producer threads to insert value Yes Multi core Architecture and Programming 23
  • 24. MSRSAS - Postgraduate Engineering and Management Programme - PEMP TC_5 After reading the data consumer thread should delete it one by one Any After reading the data which entered by the producer, consumer thread has to delete it properly Yes 2.4.2 Testing results Figure 2.17 shows the testing results of developed producer-consumer application. Here consumer thread is waiting until all the consumer threads leave the critical region. Here first priority is assigned to the first producer thread.Results of TC_1,TC_2 and TC_4 is shown in the figure below. Figure 2. 17 Producer thread is waiting in critical region Figure 2.18 shows the results of test cases TC_3 and TC_4 .Only after all server threads leave the critical region consumer can enter in it to read the values from the list. As par the given priority the consumer thread reads the value. Figure 2. 18 Consumer threads is active after all the producer thread finish the critical region NOTE: In this document all the results are documented for single iteration of application to provide clear understanding. Multi core Architecture and Programming 24
  • 25. MSRSAS - Postgraduate Engineering and Management Programme - PEMP 2.5 Conclusion Multi-core hardware is clearly increasing software complexity by driving the need for multithreaded applications. Based on the rising rate of multi-core hardware adoption in both enterprise and consumer devices, the challenge of creating multi-threaded applications is here to stay for software developers. In the coming years, multi-threaded application development will most likely become the dominant paradigm in software As this shift continues, many development organizations will transition to multi-threaded application development on the fly. In view of this, a producer-consumer application is successfully created using pthread APIs. Both threads shares same linked list. The synchronization is provided by mutex. The test cases are developed by critically analyzing the application code and assignment requirements. All the test cases are successfully tested. Multi core Architecture and Programming 25
  • 26. MSRSAS - Postgraduate Engineering and Management Programme - PEMP CHAPTER 3 Development of Consumer Producer Application with extended priority concept 3.1 Introduction All modern operating systems divide CPU cycles in the form of time quantas among various processes and threads (or Linux tasks) in accordance with their policies and priorities. Thread scheduling is one of the most important and fundamental services offered by an operating system kernel. Some of the metrics an operating system scheduler seeks to optimize are: fairness, throughput, turnaround time, response time and efficiency .Multiprocessor operating systems assume that all cores are identical and offer the same performance. 3.2 Sequence diagram Figure 3.1 shows the sequence diagram in which the message queue is prioritizes. In figure, the y-axis represents the time and x-axis represents the resources. Producer thread is shown by the left side in the image. In producer thread, vertical thin line shows the main thread and thick overlapped line shows the producer threads. Each producer threads maintain one queue to store data. On the right side of the image one consumer thread is shown. Before spawning the producer and consumer thread main thread locks four semaphores. After locking semaphores main will create four producers and one consumer. At the starting, consumer threads will tries to acquire semaphore in proper priority. All the producer thread will access their own message queue and insert the data. At the end, the producer thread will unlock the semaphore so consumer thread can have access to particular semaphore. In figure the ascending priority order for producer threads/queues are thread4, thread3, thread2, thread1.By the end of thread 1 it will release the semaphore1.The consumer thread continuously seeking for the size of the all the lists associated with the queues. But due to the priority assigned to the thread 3 is higher so consumer thread looks to the size of the third queue first. If thread three doesn’t contain any data in its queue then consumer thread will look for the lower priority queue. As result of this mechanism, by the time only thread one enters the element in the queue and releases the semaphore. By the next moment consumer thread will look for availability of the element in the queue three but it fails. No other higher priority has data in its queue then rather than waiting for the higher priority thread, consumer thread will read and delete the data associated with the lower priority queue. By the time higher priority producer threads enters the value in its queue,the consumer thread will immediately reads it and delete it. Multi core Architecture and Programming 26
  • 27. MSRSAS - Postgraduate Engineering and Management Programme - PEMP In some condition when consumer and producer thread tries to acquire resource at the same time the consumer thread is given the priority to access the resources. Each producer is storing data in their queue and unlocking semaphore Before spawning consumer /producer main task locks 4 semaphores Consumer thread has to wait until highest priority producer releases the semaphore Main task Producer thread with priority 3.2 Consu mer thread got Figure3. 1 Sequence diagram for prioritized consumer thread highest priority Development of designed application semaph There are two approaches to develop the threading programs in the Linux. One is using ore pthread APIs and the other one is by using openMP APIs. As this part of this scenario pthread APIs are chosen to develop the four producers and single consumer thread. Figure3. 2 Including library files for scenario 3 Multi core Architecture and Programming 27
  • 28. MSRSAS - Postgraduate Engineering and Management Programme - PEMP In this scenario pthreas API are used. Definition of Pthread APIs are included by including pthread.h. To provide appropriate synchronization semaphore is used. Definition of semaphore APIs and decelatarion of the semaphore type object is included with the semaphore.h. Figure 3.2 shows this files is included with the application. Figure3. 3 Declaration of constructive functions for scenario 3 Figure 3.3 shows the deceleration of constructive function. Here in this scenario four thread will create four different list. To fulfill this requirement each function for each thread is decelared. Figure3. 4 Decelaration of list pointers and location pointers Structure to pointer *myList is created which is holding starting address of the list.Here we have four different queue. To hold these queues(hold base address) four different pointer to structure list_head is created. As same, to hold the current location in the in four different list four ll_node is created. Deceleration of these all object are shown by the Figure3.4. Figure3. 5 Definition of constructive functions Figure 3.5 shows the definition of the constructive functions. By the calling of this function, it will create new list. The new list is pointed by the myList series of pointers. list_create() is the Multi core Architecture and Programming 28
  • 29. MSRSAS - Postgraduate Engineering and Management Programme - PEMP function which created the new list and returns addresses of the new list in the form of list_head structure. At the second line of the pointer p,q,r and s will be created to hold the current position of the element in the lists. At the initial level the current position is set to first position by calling the list_position_create(). Figure3. 6 Declaration of thread function and synchronization objects Here, four producer thread will call the four different function. Decelerations of these functions are shown by the figure 3.6.As the part of synchronization mechanism four semaphores and mutext is used.The only reason for using semaphore is it can be taken by one thread and released by the other thread but in the case of mutex it is not possible. The decelerations of these objects are shown by the Figure 3.6. Figure3. 7 Main function for application of scenario 2 Figure 3.7 shows the code for the main function. Semaphore is initialized at the starting of the main function. To initialize the semaphore function sem_init is used with the three arguments. The first argument shows the address of the sem_t (semaphore) type object. The second parameter Multi core Architecture and Programming 29
  • 30. MSRSAS - Postgraduate Engineering and Management Programme - PEMP shows that semaphore is shared between all the threads of process. The third parameter shows the initial value of the semaphore. Here in our case the initial value is 1 so semaphore is binary semaphore. After initializing all synchronization tools, threads are created by locking the semaphore So at the end of this four producer thread and one consumer thread is created by locking four semaphore and main thread will wait for client thread to finish execution. Figure3. 8 First producer thread Figure 3.8 shows the first producer thread. The first thread will enter the value in to the list named mylist. At the end of the function thread 1 will unlock semaphore named l_th, which was taken by the main function before crating the thread. By the same time consumer thread is also waiting to take the highest priority thread’s semaphore again. Here highest priority thread is thread 3 and semaphore associated with it is l_th3.Mutext is used to avoid the multiple threads seeks for data at the same time. Figure3. 9 Second producer thread Figure 3.9 shows the second producer thread. The second thread will enter the value in to the list named mylist1. At the end of the function thread 2 will unlock semaphore named l_th1, Multi core Architecture and Programming 30
  • 31. MSRSAS - Postgraduate Engineering and Management Programme - PEMP which was taken by the main function before crating the thread. By the same time consumer thread is also waiting to take the highest priority thread’s semaphore again. Here highest priority thread is thread 3 and semaphore associated with it is l_th3.Mutext is used to avoid the multiple threads seeks for data at the same time. Figure3. 10 Third producer thread Figure 3.10 shows the third producer thread. The third thread will enter the value in to the list named mylist2. At the end of the function thread 3 will unlock semaphore named l_th2, which was taken by the main function before crating the thread. By the same time consumer thread is waiting to lock semaphore l_th3 which is already locked by main function .Mutex is used to avoid the multiple threads seeks for data at the same time. Figure3. 11 Fourth producer thread Figure 3.10 shows the fourth producer thread. The fourth thread will enter the value in to the list named mylist3. At the end of the function thread 4 will unlock semaphore named l_th3, which was taken by the main function before crating the thread. This is the highest priority thread for Multi core Architecture and Programming 31
  • 32. MSRSAS - Postgraduate Engineering and Management Programme - PEMP which consumer thread is looking. By the moment when thread four will release the semaphore, consumer thread will be active. The consumer thread will read the data from the highest priority thread to lowest priority thread. Figure3. 12 Consumer thread with highest priority queue Figure 3.12 shows the top half of the consumer thread. As per the requirement, when the producer and consumer thread comes to gather the consumer should get highest priority to access queue. TO obtain this one instance of the structure sched_param is created. Here two APIs are used named pthread_setschedparam() and pthread_setschedprio().The first API is used to change the scheduling policy of the current thread. For consumer thread scheduling policy is set to FIFO.Basically FIFO is the scheduling policy in which the thread which comes first in ready state will get a chance to execute. No other thread can preempt the current execution of the thread. In our case due to FIFO scheduling no other thread can preempt the consumer thread. On the other side, the requirement is when consumer and producer come to at the same time consumer should get the highest priority to in such situation. To fulfill this requirement here priority of the client thread is set high and server thread is working with the normal priority. To assign priority to thread pthread_setschedprio() is used with the argument of thread ID and value of priority. In FIFO scheduling the 0 is least priority and 99 is the highest priority. After setting the priority to the thread the consumer thread will continuously mask the size variable in every list Multi core Architecture and Programming 32
  • 33. MSRSAS - Postgraduate Engineering and Management Programme - PEMP structure of the producer threads. If producer stores some data in its list, consumer will read it and remove it. Here first thread 3 is having the highest priority so consumer thread checks size of queue associated with the thread 3. If the size is not equal to zero it menace that some data is available in the queue so it has to delete as the first priority. Figure3. 13 Continuation of consumer thread for second priority queue If the highest priority thread is not having data in its queue then it not worth for consumer thread to wait until highest priority thread stores data in its queue because consumer has to serve four producer threads. To achieve this functionality if consumer thread doesn’t get the data in the highest priority thread (myList3) then it will jump to check the data is second priority queue. Here the second priority is assigned to the thread 2 and queue associated with thread 2 is myList2.This mechanism is seen form the figure 3.13. In figure consumer thread is checking the myList2 if some data is available in myList2 then consumer will print it and delete it. Figure3. 14 Continuation of consumer thread for third priority queue If the first two priority threads are not having data in their queue then it not worth for consumer thread to wait until any of two threads stores data in its queue because consumer has to serve four producer threads. To achieve this functionality if consumer thread doesn’t get the data in the first two priority thread (myList3 and myList2) then it will jump to check the data is third Multi core Architecture and Programming 33
  • 34. MSRSAS - Postgraduate Engineering and Management Programme - PEMP priority queue. Here the third priority is assigned to the thread 1 and queue associated with thread 1 is myList1.This mechanism is seen form the figure 3.14. In figure consumer thread is checking the myList3.If some data is available in myList1 then consumer will print it and delete it. Figure3. 15 Continuation of consumer thread for last priority queue If the first three priority threads are not having data in their queue then it not worth for consumer thread to wait until any of three threads stores data in its queue because consumer has to serve four producer threads. To achieve this functionality if consumer thread doesn’t get the data in the first three priority thread (myList3,myList2 and myList1) then it will jump to check the data is last priority queue. Here the last priority is assigned to the thread 0 and queue associated with thread 3 is myList3.This mechanism is seen form the figure 3.14. In figure consumer thread is checking the myList3.If some data is available in myList3 then consumer will print it and delete it. 3.3 Test cases and testing results for scenario 3 3.3.1 Test cases In this section test cases are designed for the four producer one consumer system. The table below describes the test cases which are to be performed. There are to validate the functionality of system with corner cases of input. Table 3. 1 Test cases for higher priority consumer thread TCN Test cases Test Expected Result Data TC_1 Consumer should acquire higher priorities and it should run first NA Output Obtained At the starting of the program the consumer should run and shows the list is empty Multi core Architecture and Programming Yes 34
  • 35. MSRSAS - Postgraduate Engineering and Management Programme - PEMP If the consumer and producer runs to tries to access resource together then consumer should get access first Consumer should not wait for the higher priority thread to enter value Any By the time of access of shared resources, the consumer should get higher priority Yes Any If higher priority thread doesn’t enters the value then consumer should check in the other lower priority queue Yes TC_4 If two values is enters by the any of producer thread then consumer respond it Any There are some cases when consumer is busy in printing some values and by the same time some thread enter two values in tits queue. In such condition consumer should Read and delete both values Yes TC_2 TC_3 3.3.2 Documentation of the results Figure 3.16 shows the results of the test cases which is developed in above section. It can be seen from the image that consumer thread is resounding for all the present threads queues which is having values. Attached callouts will give the batter understanding about resuts. Consumer thread is executing first as it has highest priority Thread 3 is having higher priority but it doesn’t have value in its queue. But present thread 1 has a value in its queue. Rather than waiting consumer thread giving service to the lower priority thread. Figure3. 16 Results of test cases Multi core Architecture and Programming Thread 3 has a highest priority to be read. But it is coming but consumer is not waiting for thread three. 35
  • 36. MSRSAS - Postgraduate Engineering and Management Programme - PEMP 3.5 Conclusion Consumer Producer Application with extended priority concept is explained with a sequence diagram showing the relation between FOUR static priority producer threads, one consumer thread, four queues and their synchronization mechanism, parallelized program using P thread and test cases to test the functionality. Multi core Architecture and Programming 36
  • 37. MSRSAS - Postgraduate Engineering and Management Programme - PEMP CHAPTER 4 4.1 Module Learning Outcomes This module helped to expertise on parallel programming for multi-core architectures, learned multi-core processors along with their performance quantification & usability techniques, single & multi-core optimization techniques and development of multi-threaded parallel programming are explained clearly. Virtualization and partitioning techniques are explained detailed along with specific challenges and solutions. Got an idea about parallel programming of multi-core processors with appropriate case studies using OpenMP and pthreads. After this module I am able to analyze multi-core architectures, optimization process of single- & multi-core processors and parallel programming for multi-core processors proficiently using OpenMP library and GCC complier for programming multi-core processors and applying parallel programming concepts for developing applications on multi-core processors was well taught using lab programs has become a efficient way of learning pthreads, OpenMP and various synchronization techniques for eliminating deadlock situation. 4.2 Conclusion In Chapter1 By critically comparing throughput and latency for available arbitration schemes of multi core, Round Robin is better when compare to the other arbitration, since it is a fair-queuing packet scheduler which has good fairness and delay properties, and low complexity, even there are lot more negative aspects in round robin hope replacements would be done in future. In chapter2 Multi-core hardware is clearly increasing software complexity by driving the need for multi-threaded applications. Based on the rising rate of multi-core hardware adoption in both enterprise and consumer devices, the challenge of creating multi-threaded applications is here to stay for software developers. In view of this, a producer-consumer application is successfully created using pthread APIs. Both threads shares same linked list. The synchronization is provided by mutex. The test cases are developed by critically analyzing the application code and assignment requirements. All the test cases are successfully tested. In chapter3 Consumer Producer Application with extended priority concept is explained with a sequence diagram showing the relation between FOUR static priority producer threads, one consumer thread, four queues and their synchronization mechanism, parallelized program using P thread and test cases to test the functionality. Multi core Architecture and Programming 37
  • 38. MSRSAS - Postgraduate Engineering and Management Programme - PEMP References [1] http://www.coverity.com/library/pdf/coverity_multi-threaded_whitepaper.pdf [2] www.irit.fr/publis/TRACES/12619_etfa2011.pd [3] www.cs.fsu.edu/research/reports/TR-100401.pd [4] paper.ijcsns.org/07_book/200809/20080936.pdf [5] www.sti.uniurb.it/bogliolo/e-publ/KLUWERjdaes03.pdf Multi core Architecture and Programming 38
  • 39. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Appendix-1 Multi core Architecture and Programming 39
  • 40. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Appendix-2 Multi core Architecture and Programming 40