Assignment 9

ASSIGNMENT
Module Code
Module Name
Course
Department

ESD 532
Multi core Architecture and Programming
M.Sc. [Engg.] in Real Time Embedded
Systems
Computer Engineering

Name of the Student

Bhargav Shah

Reg. No

CHB0910001

Batch

Full-Time 2011.

Module Leader

Padma Priya Dharishini P.

M.S.Ramaiah School of Advanced Studies
Postgraduate Engineering and Management Programmes(PEMP)
#470-P Peenya Industrial Area, 4th Phase, Peenya, Bengaluru-560 058
Tel; 080 4906 5555, website: www.msrsas.org

POSTGRADUATE ENGINEERING AND MANAGEMENT PROGRAMME – (PEMP)

MSRSAS - Postgraduate Engineering and Management Programme - PEMP

i


Declaration Sheet
Student Name

Bhargav Shah

Reg. No

CHB0910001

Course

RTES

Batch

Full-Time 2011

Module Code

ESD 532

Module Title
Module Date

Multi Core Architecture and Programming
to
06-02-2012
03-03-2012

Module Leader


Batch Full-Time 2011.

Extension requests:
Extensions can only be granted by the Head of the Department in consultation with the module leader.
Extensions granted by any other person will not be accepted and hence the assignment will incur a penalty.
Extensions MUST be requested by using the ‘Extension Request Form’, which is available with the ARO.
A copy of the extension approval must be attached to the assignment submitted.

Penalty for late submission
Unless you have submitted proof of mitigating circumstances or have been granted an extension, the
penalties for a late submission of an assignment shall be as follows:
• Up to one week late:
Penalty of 5 marks
• One-Two weeks late:
Penalty of 10 marks
• More than Two weeks late:
Fail - 0% recorded (F)
All late assignments: must be submitted to Academic Records Office (ARO). It is your responsibility to
ensure that the receipt of a late assignment is recorded in the ARO. If an extension was agreed, the
authorization should be submitted to ARO during the submission of assignment.
To ensure assignment reports are written concisely, the length should be restricted to a limit
indicated in the assignment problem statement. Assignment reports greater than this length may
incur a penalty of one grade (5 marks). Each delegate is required to retain a copy of the assignment
report.

Declaration
The assignment submitted herewith is a result of my own investigations and that I have conformed to the
guidelines against plagiarism as laid out in the PEMP Student Handbook. All sections of the text and
results, which have been obtained from other sources, are fully referenced. I understand that cheating and
plagiarism constitute a breach of University regulations and will be dealt with accordingly.

Signature of the student

Date

Submission date stamp
(by ARO)

Signature of the Module Leader and date


Signature of Head of the Department and date

ii


M. S. Ramaiah School of Advanced Studies
Postgraduate Engineering and Management Programme- Coventry University (UK)
Assessment Sheet
Department

Computer Engineering

Course

RTES

Module Code

ESD 532

Module Leader


Module Completion
Date

03-03-2012

Student Name

Bhargav Shah

ID Number

CHB0910001

Attendance Details

Batch
Module Title

Theory

Full-Time 2011


Laboratory

Fine Paid

Remarks

(if any for shortage of attendance)

Q. No

a

Written Examination – Marks – Sheet (Assessor to Fill)
C
d
Total
Remarks

b

1
2
3
4
5
6
Marks Scored for 100

Part

a

Marks Scored out of 50

Result
PASS
Assignment – Marks-Sheet (Assessor to Fill)
C
d
Total

b

FAIL
Remarks

A
B
C
Marks Scored for 100
Result

Marks Scored out of 50
PASS

FAIL

PMAR- form completed for student feedback (Assessor has to mark)
Yes
Overall-Result
Components
Assessor
Reviewer

No

Written Examination (Max 50)

Pass / Fail

Assignment (Max 50)

Pass / Fail

Total Marks (Max 100) (Before Late Penalty)

Grade

Total Marks (Max 100) (After Late Penalty)

Grade

IMPORTANT
1. The assignment and examination marks have to be rounded off to the nearest integer and entered in the respective fields
2. A minimum of 40% required for a pass in both assignment and written test individually
3. A student cannot fail on application of late penalty (i.e. on application of late penalty if the marks are below 40, cap at 40 marks)

Signature of Reviewer with date


Signature of Module Leader with date

iii


Abstract
Multi-core processors may provide higher performance to current embedded processors to
support future embedded systems functionalities. According to the Industrial Advisor Board,
embedded systems will benefit from multi-core processors, as these systems are comprised by
mixed applications, i.e. applications with hard real-time constrains and without real-time constrains,
that can be executed into the same processor.
Moreover, the Industrial Advisor Board also stated that memory operations represent one of
the main bottlenecks that current embedded applications must face, being even more important than
the performance of the core that can suffer a degradation of 10-20% without really affecting overall
performance. We take profit of this fact by studying the effect of running several threads per core,
that is, we make the core multithreaded. And we also studied the effect of caches, which are a well
known technique in high performance computing to reduce the memory bottleneck.
Chapter 1 discuss on Arbitration schemes of Memory Access in Multicore Systems , what
are the types of arbitration schemes existed up to now which is the best one of them, what are the
challenging factors for these arbitration schemes in the present situation and finally short note on
the factors that support the proposed arbitration schemes.
Chapter 2 discuss about a multi-threaded concept of consumer and producer threads how are
going to share a common queue, how to prioritize the threads if we are sharing a common thread
and some test cases to test the scenarios.
Chapter 3 discuss about a different situation having 4 of producers with different queues and
single consumers and it will discuss about the changing priority levels of the consumer so that in
the conflicting condition with the consumer thread the producer will get the high priority to
execute.


iv


Contents
Declaration Sheet .................................................................................................................................ii
Abstract .............................................................................................................................................. iv
List of Figures ...................................................................................................................................vii
Symbols .............................................................................................................................................vii
Nomenclature...................................................................................................................................viii
CHAPTER 1 ....................................................................................................................................... 9
Arbitration schemes of memory access in multi core ..................................................................... 9
1.1 Introduction ...........................................................................................................................9
1.2 Types of arbitration schemes .................................................................................................9
1.3 Challenges in arbitration schemes .......................................................................................10
1.4 Impact of the arbitration schemes on throughput and latency .................................................11
1.5 Proposal of better arbitration scheme with justification ..........................................................11
1.6 Conclusion ...............................................................................................................................12
CHAPTER 2 ..................................................................................................................................... 13
Development of Consumer Producer Application ........................................................................ 13
2.1 Introduction ..............................................................................................................................13
2.2 Sequence diagram ....................................................................................................................13
2.3 Development of parallelized program using Pthread/openMP ................................................14
2.4 Test cases and Testing results for scenario 1 ...........................................................................17
2.4.1 Test cases ........................................................................................................................................ 17
2.4.2 Testing results................................................................................................................................. 18
2.5 Sequence diagram.............................................................................................................................. 19

2.6 Development of paralleled program using pthread/openMP ...................................................20
2.4 Test cases and Testing results for scenario 2 ...........................................................................23
2.4.1 Test cases ........................................................................................................................................ 23
2.4.2 Testing results................................................................................................................................. 24

2.5 Conclusion ...............................................................................................................................25
CHAPTER 3 ..................................................................................................................................... 26
Development of Consumer Producer Application with extended priority concept ................... 26
3.1 Introduction ..............................................................................................................................26
3.2 Sequence diagram................................................................................................................26
3.2 Development of designed application ......................................................................................27
3.3 Test cases and testing results for scenario 3 ........................................................................34
3.3.1 Test cases........................................................................................................................................ 34
3.3.2 Documentation of the results ........................................................................................................ 35

3.5 Conclusion ...............................................................................................................................36
CHAPTER 4 ..................................................................................................................................... 37
4.1 Module Learning Outcomes.....................................................................................................37
4.2 Conclusion ...............................................................................................................................37
References ......................................................................................................................................... 38
Appendix-1 ........................................................................................................................................ 39
Appendix-2 ........................................................................................................................................ 40


v


List of Tabl
Table 2. 1 Test cases for single producer single consumer ................................................................17
Table 2. 2 Test cases for single producer single consumer ................................................................23

Table 3. 1 Test cases for higher priority consumer thread .................................................................34


vi


List of Figures
Figure 2. 1 Sequence diagram for one producer and one consumer ..................................................14
Figure 2. 2 Including Libraries and files for scenario 1 .....................................................................14
Figure 2. 3 Deceleration of mutext and structures for scenario 1 ......................................................14
Figure 2. 4 Function to create new list for scenario 1 ........................................................................15
Figure 2. 5 Main function for Application of scenario 1 ...................................................................15
Figure 2. 6 Body of producer thread for scenario 1 ...........................................................................16
Figure 2. 7 Body of consumer thread for scenario 1 ..........................................................................17
Figure 2. 8 Producer thread is waiting for value in critical region ....................................................18
Figure 2. 9 Consumer thread is printing the value from inserted by producer thread .......................18
Figure 2. 10 Sequence diagram of three producer one consumer ......................................................19
Figure 2. 11 Including Libraries and files for scenario 2 ...................................................................20
Figure 2. 12 Deceleration of mutext and structures for scenario 2 ....................................................20
Figure 2. 13 Function to create new list for scenario 2 ......................................................................20
Figure 2. 14 Main function for application of scenario 2 ..................................................................21
Figure 2. 15 Body of producer thread for scenario 2 .........................................................................22
Figure 2. 16 Body of consumer thread for scenario 2 ........................................................................22
Figure 2. 17 Producer thread is waiting in critical region ..................................................................24
Figure 2. 18 Consumer threads is active after all the producer thread finish the critical region .......24
Figure3. 1 Sequence diagram for prioritized consumer thread ..........................................................27
Figure3. 2 Including library files for scenario 3 ................................................................................27
Figure3. 3 Declaration of constructive functions for scenario 3 ........................................................28
Figure3. 4 Decelaration of list pointers and location pointers ...........................................................28
Figure3. 5 Definition of constructive functions .................................................................................28
Figure3. 6 Declaration of thread function and synchronization objects ............................................29
Figure3. 7 Main function for application of scenario 2 ....................................................................29
Figure3. 8 First producer thread .........................................................................................................30
Figure3. 9 Second producer thread ....................................................................................................30
Figure3. 10 Third producer thread .....................................................................................................31
Figure3. 11 Fourth producer thread ...................................................................................................31
Figure3. 12 Consumer thread with highest priority queue.................................................................32
Figure3. 13 Continuation of consumer thread for second priority queue ..........................................33
Figure3. 14 Continuation of consumer thread for third priority queue .............................................33
Figure3. 15 Continuation of consumer thread for last priority queue ................................................34
Figure3. 16 Results of test cases ........................................................................................................35


vii


Nomenclature

WRR
CMP
SDRAM
DRR
SRR
PD
PBS
TDMA
CCSP

Weighted Round Robin
Chip Multiprocessors
Synchronous Dynamic Random Access Memory
Deficit Round Robin
Stratified Round Robin
Priority Division
Priority Based Budget Scheduler
Time Division Multiple Access
Credit Controlled Static Priority


viii

CHAPTER 1
Arbitration schemes of memory access in multi core
1.1 Introduction
The constraints of embedded systems in terms of power consumption, thermal dissipation, costefficiency and performance can be met by using multi core processors (CMP or chip
multiprocessors). On typical medium size CMPs, the cores share a bus to the highest levels of the
memory hierarchy. In multi-core architectures, resources are often shared to reduce cost and
exchange information. An off-chip memory is one of the most common shared resources. SDRAM
is a popular off-chip memory currently used in cost sensitive and performance demanding
applications due to its low price, high data rate and large storage. An asynchronous refresh
operation and a dependence on the previous access make SDRAM access latency vary by an order
of magnitude. The main contribution of this report to critically compare throughput and latency for
available arbitration schemes of multi core. At the end justification for batter arbitration scheme is
derived from the analysis.

1.2 Types of arbitration schemes[1]
There have been many approaches to provide fairness, high throughput and worst case latency
bounds in the arbiter especially in the networks domain.
Weighted Round Robin (WRR) is a work conserving arbiter where cores are allocated a
number of slots within a round robin cycle depending on their bandwidth requirements. If a core
does not use its slot, the next active core in the round robin cycle is immediately assigned to
increase the throughput. Cores producing busty traffic benefit at the cost of cores which produce
uniform traffic.
Deficit Round Robin (DRR) assigns different slot sizes to each master according to its
bandwidth requirements and schedules them in a Round Robin (RR) fashion. Difference between
DRR and RR is that if a master cannot use its slot or part of its slot in the current cycle, the
remaining slot (deficit) is added into the next cycle. In the next cycle, the master can transfer up to
an amount of data equal to the sum of its slot size and the deficit. Thus, the DRR tries to avoid the
unfairness caused to uniform traffic generators in the WRR.
Stratified Round Robin (SRR) groups masters with alike bandwidth requirements into one
Class. After grouping masters into various classes ,two step arbitration is applied: interclass and
infraclass. The inter class scheduler schedules each class Fk once in 2k clock cycles. Hence, the
lesser the k, the often the class is scheduled. The intra class scheduler uses WRR mechanism to


select the next master within the class. Due to more uniform distribution of bandwidth, SRR
reduces the worst case latencies compared to the WRR. However, to achieve low worst case latency
for a class Fk, k must be minimized which leads to over allocation.
Priority Division (PD) combines TDMA and static priorities to achieve guarantees and high
resource utilization. Instead of fixing TDMA slots statically, PD fixes priorities of each master
within the slot statically such that each master has at least one slot where it has the highest priority.
Thus, masters have guarantees equal to TDMA and unused slots are arbitrated based on static
priority to increase the resource utilization. This approach provides benefit over RR or WRR only if
the response time of the shared resource is fixed. In the case of variable response time (e.g.
SDRAM), this approach produces high worst case latencies.
. In Priority Based Budget Scheduler (PBS), masters are assigned fixed budgets of access in a
unit time (Replenishment Period). Moreover, masters are also assigned fixed priorities to resolve
conflicts. Budget relates to master's bandwidth requirements while priority relates to master's
latency requirements. Thus, coupling between latency and bandwidth is removed. The shared
resource is granted to the active master with the highest priority which still has a budget left. At the
beginning of a replenishment period, each master gets its original budget back.
Akesson et al introduce a Credit Controlled Static Priority (CCSP) arbiter. The CCSP also uses
priorities and budgets within the replenishment period. But, instead of using frame based
replenishment periods, masters are replenished incrementally for fine grade bandwidth assignment.

1.3 Challenges in arbitration schemes
The traditional shared bus arbitration schemes such as TMDA and round robin show the
several defects such as bus starvation, and low system performance. In strict priority scheduling the
higher priority packets can get the most of the bandwidth therefore the lower priority packets has to
wait for longer time for the resource allocation. This will cause starvation in lower priority packets.
In case of WRR and LARD regarding power consumption is that both of them always have their
servers turned ON even though some of them do not serve any requests. Therefore, they cannot
conserve any power; Weighted Round-Robin and Deficit Round-Robin are extensions that
guarantee each requestor a minimum service, proportional to an allocated rate, in a common
periodically repeating frame of fixed size. This type of frame-based rate regulation is similar to the
Deferrable Server, and suffers from an inherent coupling between allocation granularity and
latency, where allocation granularity is inversely proportional to the frame size. Larger frame size
results in finer allocation granularity, reducing over allocation, but at the cost of increased latencies
for all requestors. Another common example of frame-based scheduling disciplines is time-division


10


multiplexing that suffers from the additional disadvantage that it requires a schedule to be stored for
each configuration, which is very costly if the frame size or the number of use cases are large[2].
The above arbitration algorithms cannot handle the strict real-time requirements, so two-level
arbitration algorithm, which is called the RB_Lottery bus arbitration, has been developed which
will solve the impartiality, starvation and real-time problems, which exist in the Lottery method,
and reduces the average latency for bus requests[5]. Within hardware verifications, the proposed
arbiter processes higher operation frequency than the Lottery arbiter. Although the pays more
attention on the chip area and power consumptions than the Lottery arbiter and it also has less
average latency of bus requests than the Lottery arbitration.

1.4 Impact of the arbitration schemes on throughput and latency[4]
Each approach to provide fairness, high throughput and worst case latency bounds,
optimization of one factor degrades other factors. In Weighted Round Robin to provide low worst
case latency to any core, it has to be assigned more slots in the round robin cycle which leads to
over allocation. Deficit Round Robin (DRR) has very high latencies in the worst case. For example,
one master stays idle for a long time and gains high deficit. Afterwards, it contentiously requests
the shared resource. Since it has gained a high deficit, it will occupy the shared resource for a long
time incurring very high latencies to other masters.
Due to the presence of priorities, PBS is fair to high priority masters and unfair to low priority
masters. When all masters are executing HRTs (as outlined in the introduction), PBS results in large
WCETs for low priority masters. Credit Controlled Static Priority (CCSP) also due to the presence
of the priorities, large worst case execution time bounds for lower priority masters are produced.

1.5 Proposal of better arbitration scheme with justification
Stratified Round Robin is better when compare to the other arbitration, since it is a fairqueuing packet scheduler which has good fairness and delay properties, and low complexity. It is
unique among all other schedulers of comparable complexity in that it provides a single packet
delay bound that is independent of the number of flows. Importantly, it is also enable a simple
hardware implementation, and thus fills a current gap between scheduling algorithms that have
provably good performance and those that are feasible and practical to implement in high-speed
routers. Interactive applications such as video and audio conferencing require the total delay
experienced by a packet in the network to be bounded on an end-to-end basis. The packet scheduler
decides the order in which packets are sent on the output link, and therefore determines the queuing
delay experienced by a packet at each intermediate router in the network. The Low complexity with


11


line rates increasing to 40 Gbps, it is critical that all packet processing tasks performed by routers,
including output scheduling, be able to operate in nanosecond time frames.

1.6 Conclusion
By critically comparing throughput and latency for available arbitration schemes of multi core,
Round Robin is better when compare to the other arbitration, since it is a fair-queuing packet
scheduler which has good fairness and delay properties, and low complexity, even there are lot
more negative aspects in round robin hope replacements would be done in future.


12


CHAPTER 2
Development of Consumer Producer Application
2.1 Introduction
Today, the world of software development is presented with a new challenge. To fully
leverage this new class of multi-core hardware, software developers must change the way they
create applications. By turning their focus to multi-threaded applications, developers will be able to
take full advantage of multi-core devices and deliver software that meets the demands of the world.
But this paradigm of multi-threaded software development adds a new wrinkle of complexity for
those who care the utmost about software quality. Concurrency defects such as race conditions and
deadlocks are software defect types that are unique to multi-threaded applications. Complex and
hard-to-find, these defects can quickly derail a software project. To avoid catastrophic failures in
multithreaded applications, software development organizations must understand how to identify
and eliminate these deadly problems early in the application development lifecycle.
Here as the part of this work multi threaded producer consumer application is created using
the given linked list program. Two scenarios has been accommodate in this part of document. In the
first case producer will insert one value to the doubled linked list and at the other edge the
consumer will read that value and delete it. As the part of the second case three producer threads
tries to insert value in the linked list, at the end one consumer thread tries to read it and delete it.
Proper synchronization mechanism is developed.

2.2 Sequence diagram
A sequence diagram is a kind of interaction diagram that shows how processes operate with
one another and in what order. It is a construct of a Message Sequence Chart. A sequence diagram
shows object interactions arranged in time sequence. It depicts the objects and classes involved in
the scenario and the sequence of messages exchanged between the objects needed to carry out the
functionality of the scenario. Sequence diagrams typically are associated with use case realizations
in the Logical View of the system under development.
Figure 2.1 shows the sequence diagram for the one producer and one consumer .In figure,
the y-axis represents the time and x-axis represents the resources. In top left side one producer
thread and top right side one consumer thread is shown. At the starting, produce has to write the
data in the linked list. But linked list is shared between producer and consumer thread. To provide
synchronization between producer and consumer thread mutext is used. So, producer locks the
mutext and write data to the list. By the same time if the consumer tries to read the data then it tries
to acquire mutext which is taken by producer and it fails. In this case the consumer thread has to


13


wait until the producer releases the mutext.This phenomena shown in Figure 2.1. In this application
consumer can’t read the data until the producer produces it and store it in linked list. This is
synchronization mechanism is achieved by using mutext.

Trying
Obtain
mutext
but fails

Critical
region

Consumer
has wait
until
resource
will be
free by
producer
Figure 2. 1 Sequence diagram for one producer and one consumer

2.3 Development of parallelized program using Pthread/openMP
There are two approaches to develop the threading programs in the Linux. One is using
pthread APIs and the other one is by using openMP APIs. As this part of this scenario pthread APIs
are chosen to develop the single producer and single consumer thread.

Figure 2. 2 Including Libraries and files for scenario 1

Figure 2.2 shows the all preprocessor statements of the code segment. The first one is
“ll2.c” file which is having definition of all the functions related to the linked list operation. The
second one is “ pthread.h”, which is having definition of all the threading related APIs. The last two
are the standard library files for normal functions. As the last lines of the image, a function named
create is declared.

Figure 2. 3 Deceleration of mutext and structures for scenario 1


14


To obtain synchronization in the application mutext is used. Here “lock” is defined as the
pthread mutext object. It is essential to initialize the mutext before using it. Here initialization is
handle by the assignment operation of the macro named “PTHREAD_MUTEX_INITIALIZER”.
Structure to pointer *myList is created which is holding starting address of the list. Structure to
pointer *p is also created which is pointed the current position for accessing the value. These all
declaration is shown by the Figure 2.2.

Figure 2. 4 Function to create new list for scenario 1

Figure 2.4 shows the definition of the creat function. By the calling of this function, it will
create new list. The new list is pointed by the myList pointer. list_crate() is the function which
created the new list and returns addresses of the new list in the form of list_head structure. At the
second line of the pointer p is created to hold the current position of the element in the list.At the
initial level the current position is set to first position. By calling the list_position_creat().

Figure 2. 5 Main function for Application of scenario 1

Figure 2.5 shows the main function for the single producer and single consumer application.
In figure, two functions is declared with the void pointer argument and void pointer return type.
The function named “ser” is the function which is called by the producer thread. The other side
consumer thread will call function “cli”.In the main One void pointer named “exit” is defined to
obtain the return value from the thread function. Here, two thread object is defined named “t_ser”
and “t_cli”.On the successful creation of the producer thread, ID of the thread is stored in the


15


“t_ser” and ID of consumer thread is stored in “t_cli”.To create the thread “pthread_create” API is
used with the appropriate arguments. In this application two threads are created one is producer
thread and other is the consumer thread. The consumer thread dies automatically if the producer of
the consumer thread exits. To avoid this situation main thread has to wait until the consumer thread
exits successfully. This mechanism is provided by the “pthread_join” API.

Figure 2. 6 Body of producer thread for scenario 1

Figur 2.6 shows the body for producer thread.At the starting of the producer thread the
creat() is called. This will create the one new list and assign the pointer p to first location.
Consumer can’t get any value before the producer stores it to the list. To avoid such race condition
mutex named “lock” is used. Function named “pthread_mutex_lock” is used to take the mutex and
enter in to the critical region. After this producer thread will take a value from the user in the
variable “val”. The user entered value is stored in the list and the position of the pointer p is
updated with the new current location. The storing mechanism is provided by the function
“list_inserLast” with the argument of the list object (myList) and value to be inserted. After
successful insertion of the value in the list any of the thread can get that value. So to end up with
the critical region, to release the obtained mutex “pthread_mutex_unlock” function is used. In the
critical region if the consumer thread tries to take the mutex or tries to access the critical section it
has to wait until the producer realizes it. So, after unlocking the mutex by producer thread the
consumer thread will acquire the resources.
Figure 2.7 shows the body of the consumer thread. At the starting of the thread function it
tries to take mutex. After unlocking the mutex by the producer thread the consumer thread gets
access to the shared list. The value is displayed by the passing the list object to the function named
“list_display”. Now the consumer thread has to remove the value. To do this function


16


“list_removeLast” is called with the list object. The return type of this function is location of
previous data. After removing this data mutex which is taken by the consumer thread is realized.
This whole phenomenon is shown by the code in figure 2.7.

Figure 2. 7 Body of consumer thread for scenario 1

2.4 Test cases and Testing results for scenario 1
2.4.1 Test cases
In this section test cases are designed for the producer consumer system. The table below
describes the test cases which are to be performed. There are to validate the functionality of system
with corner cases of input.
Table 2. 1 Test cases for single producer single consumer

TCN

Test cases

Test

Expected Result

Data

Output
Obtained

TC_1

Producer thread
will insert value

Int

Consumer should read the value inserted by
the producer

Yes

TC_2

Only after
producer thread
unlocks
resources
consumer
should acquire it
Main thread
should wait until
all the child
threads are exits

Any

The proper synchronization should be
maintain by producer and consumer threads

Yes

Any

The main thread has to alive until all the
threads executes completely.

Yes

Any kind of
dead lock
should not
occurs

Any

All the function of the program should
execute due to resource locking it should not
create dead lock

Yes

TC_3

TC_4


17


TC_5

After reading
the data
consumer thread
should delete it

Any

After reading the data which entered by the
producer, consumer thread has to delete it
properly

Yes

2.4.2 Testing results
Figure 2.8 shows the testing results of TC_1,TC_2 and TC_4.Here the server thread is
waiting for the value from user domain. The server thread will holds critical region until it stores he
value in the shared list. By this time client thread is waiting to acquire resources.

Figure 2. 8 Producer thread is waiting for value in critical region

. Figure 2.9 shows the results of TC_3 and TC_4.By the time producer thread is leave the
critical region the consumer thread will enter in the critical region to read the value entered by the
producer.Here after reading the value the consumer thread has to delete it .This is shown in figure

Figure 2. 9 Consumer thread is printing the value from inserted by producer thread


18


Figure 2.10 shows the sequence diagram for the three producers and one consumer .In
figure, the y-axis represents the time and x-axis represents the resources. In left side of the figure
Consumer
thread is in
wait state
due to
resource is
acquired
by
producer
threads

(on y axis) three producer threads and right side one consumer thread is shown.
Producer
thread 1
is in
critical
region

Producer
thread 2
is in
critical
region

Producer
thread 3
is in
critical
region

Trying
Obtain
mutext
but fails

Figure 2. 10 Sequence diagram of three producer one consumer

At the starting, every produce has to write the data in the linked list. But linked list is shared
between producer and consumer thread. To provide synchronization between producer and
consumer thread mutext is used. So, every producer locks the mutext and write data to the list. By
the same time if the consumer tries to read the data then it tries to acquire mutext which is taken by
producer and it fails. In this case the consumer thread has to wait until the producer releases the
mutext. This phenomena shown in Figure 2.10. In this application consumer can’t read the data
until the producer produces it and store it in linked list. This is synchronization mechanism is
achieved by using mutext.


19


2.6 Development of paralleled program using pthread/openMP
There are two approaches to develop the threading programs in the Linux. One is using
are chosen to develop the three producers and single consumer thread. Definition procedures for
both scenarios are same the only difference is in the main body of application code.

Figure 2. 11 Including Libraries and files for scenario 2

Figure 2.11 shows the all preprocessor statements of the code segment. The first one is
“ll2.c” file which is having definition of all the functions related to the linked list operation. The
second one is “ pthread.h”, which is having definition of all the threading related APIs. The last two
are the standard library files for normal functions. As the last lines of the image, a function named
create is declared.

Figure 2. 12 Deceleration of mutext and structures for scenario 2

To obtain synchronization in the application mutext is used. Here “lock” is defined as the
pthread mutext object. It is essential to initialize the mutext before using it. Here initialization is
handle by the assignment operation of the macro named “PTHREAD_MUTEX_INITIALIZER”.
Structure to pointer *myList is created which is holding starting address of the list. Structure to
pointer *p is also created which is pointed the current position for accessing the value. These all
declaration is shown by the Figure 2.12.

Figure 2. 13 Function to create new list for scenario 2

Figure 2.13 shows the definition of the creat function. By the calling of this function, it will
create new list. The new list is pointed by the myList pointer. list_crate() is the function which
created the new list and returns addresses of the new list in the form of list_head structure. At the


20


second line of the pointer p is created to hold the current position of the element in the list.At the
initial level the current position is set to first position. By calling the list_position_creat().
Figure 2.14 shows the main function for multiple producer and single consumer application.
In figure ,two function is declared with the void pointer argument and void pointer return type. The
function named “ser” is the function which is called by the producer thread. Here there are three
producer threads are available which will call the same function thrice. The other side consumer
thread will call function “cli”. In the main One void pointer named “exit” is defined to obtain the
return value from the thread function.

Figure 2. 14 Main function for application of scenario 2

Here, five thread object is defined named “t_ser”, “t_ser1”, “t_ser2”, “t_ser3” and “t_cli”.
On the successful creation of the producer thread, ID of the thread is stored in the defined thread
object for producer and ID of consumer thread is stored in “t_cli”. Before creating thread here
creat() function is called to generate list and assign current location to the pointer p. In the case of
single producer and single consumer this function is called in the producer thread function because
the both threads run only once in that case. In this case producer function will execute thrice so
every time no need of creating new list. Once list is created, all the threads have to insert the value
and append the location pointer. To create the thread “pthread_create” API is used with the
appropriate arguments. In this application four threads are created in which three producer threads
and one consumer thread. The consumer thread dies automatically if the main thread exits. To avoid
this situation main thread has to wait until the consumer thread exits successfully. This mechanism
is provided by the “pthread_join” API.
Consumer can’t get any value before the producer stores it to the list. Even consumer has to
wait until all the producer stores the value in the list. The other side, no other producer thread can


21


insert value if one producer thread is in critical region. To achieve such synchronization, mutex
named “lock” is used. Function named “pthread_mutex_lock” is used to take the mutex and enter in
to the critical region. After these producers thread will take a value from the user in the local
variable “val”. The user entered value is stored in the list and the position of the pointer p is
updated by every producer thread. The storing mechanism is provided by the function
“list_inserLast” with the argument of the list object (myList) and value to be inserted from last.
After successful insertion of the value by all the producer thread any of the thread (consumer) can
get that value. So to end up with the critical region, to release the obtained mutex
“pthread_mutex_unlock” function is used. The body of the producer threads is shown by Figure
2.15.

Figure 2. 15 Body of producer thread for scenario 2

Figure 2. 16 Body of consumer thread for scenario 2


22


Figure 2.16 shows the body of the consumer thread. At the starting of the thread function it
tries to take mutex. After unlocking the mutex by the producer thread the consumer thread gets
access to the shared list. The value is displayed by the passing the list object to the function named
“list_display”. Now the consumer thread has to remove the value. To do this function
“list_removeLast” is called to remove single value from the list. In this scenario there are three
values are available in the list. So reading and deletion procedure is repeated thrice. The return type
of this function is location of previous data. After removing this all data mutex which is taken by
the consumer thread is realized.

2.4 Test cases and Testing results for scenario 2
2.4.1 Test cases
In this section test cases are designed for the four producer one consumer system. The table
below describes the test cases which are to be performed. There are to validate the functionality of
system with corner cases of input.
Table 2. 2 Test cases for single producer single consumer

TCN

Test cases

Test

Expected Result

Data

TC_1

TC_2

TC_3

All producer
threads should
insert value in
List
Consumer
thread should
reads the value
in appropriate
priority
Main thread
should wait until
all the child
threads are exits

TC_4 Two producer
threads should
not insert value
by same time

Output
Obtained

Int

Consumer should read the value inserted
by the producer threads

Yes

Any

Priority is assigned to the all the producer
threads. Consumer should read it in the
proper priority order

Yes

Any

The main thread has to alive until all the
threads executes completely.

Yes

Any

The proper synchronization mechanism
should be maintained by the app producer
threads to insert value

Yes


23


TC_5

After reading
the data
consumer thread
should delete it
one by one

Any

After reading the data which entered by
the producer, consumer thread has to
delete it properly

Yes

2.4.2 Testing results
Figure 2.17 shows the testing results of developed producer-consumer application. Here
consumer thread is waiting until all the consumer threads leave the critical region. Here first
priority is assigned to the first producer thread.Results of TC_1,TC_2 and TC_4 is shown in the
figure below.

Figure 2. 17 Producer thread is waiting in critical region

Figure 2.18 shows the results of test cases TC_3 and TC_4 .Only after all server threads
leave the critical region consumer can enter in it to read the values from the list. As par the given
priority the consumer thread reads the value.

Figure 2. 18 Consumer threads is active after all the producer thread finish the critical region

NOTE: In this document all the results are documented for single iteration of application to
provide clear understanding.


24


2.5 Conclusion
Multi-core hardware is clearly increasing software complexity by driving the need for multithreaded applications. Based on the rising rate of multi-core hardware adoption in both enterprise
and consumer devices, the challenge of creating multi-threaded applications is here to stay for
software developers. In the coming years, multi-threaded application development will most likely
become the dominant paradigm in software As this shift continues, many development
organizations will transition to multi-threaded application development on the fly.
In view of this, a producer-consumer application is successfully created using pthread APIs.
Both threads shares same linked list. The synchronization is provided by mutex. The test cases are
developed by critically analyzing the application code and assignment requirements. All the test
cases are successfully tested.


25


CHAPTER 3
Development of Consumer Producer Application with extended
priority concept
3.1 Introduction
All modern operating systems divide CPU cycles in the form of time quantas among various
processes and threads (or Linux tasks) in accordance with their policies and priorities. Thread
scheduling is one of the most important and fundamental services offered by an operating system
kernel. Some of the metrics an operating system scheduler seeks to optimize are: fairness,
throughput, turnaround time, response time and efficiency .Multiprocessor operating systems
assume that all cores are identical and offer the same performance.

Figure 3.1 shows the sequence diagram in which the message queue is prioritizes. In figure, the
y-axis represents the time and x-axis represents the resources. Producer thread is shown by the left
side in the image. In producer thread, vertical thin line shows the main thread and thick overlapped
line shows the producer threads. Each producer threads maintain one queue to store data. On the
right side of the image one consumer thread is shown. Before spawning the producer and consumer
thread main thread locks four semaphores. After locking semaphores main will create four
producers and one consumer. At the starting, consumer threads will tries to acquire semaphore in
proper priority. All the producer thread will access their own message queue and insert the data. At
the end, the producer thread will unlock the semaphore so consumer thread can have access to
particular semaphore.
In figure the ascending priority order for producer threads/queues are thread4, thread3, thread2,
thread1.By the end of thread 1 it will release the semaphore1.The consumer thread continuously
seeking for the size of the all the lists associated with the queues. But due to the priority assigned to
the thread 3 is higher so consumer thread looks to the size of the third queue first. If thread three
doesn’t contain any data in its queue then consumer thread will look for the lower priority queue.
As result of this mechanism, by the time only thread one enters the element in the queue and
releases the semaphore. By the next moment consumer thread will look for availability of the
element in the queue three but it fails. No other higher priority has data in its queue then rather than
waiting for the higher priority thread, consumer thread will read and delete the data associated with
the lower priority queue. By the time higher priority producer threads enters the value in its
queue,the consumer thread will immediately reads it and delete it.


26


In some condition when consumer and producer thread tries to acquire resource at the same
time the consumer thread is given the priority to access the resources.

Each producer
is storing data
in their queue
and unlocking
semaphore

Before spawning
consumer
/producer main task
locks 4 semaphores

Consumer thread
has to wait until
highest priority
producer releases
the semaphore

Main task

Producer
thread
with
priority

3.2

Consu
mer
thread
got
Figure3. 1 Sequence diagram for prioritized consumer thread
highest
priority
Development of designed application
semaph
There are two approaches to develop the threading programs in the Linux. One is using ore

are chosen to develop the four producers and single consumer thread.

Figure3. 2 Including library files for scenario 3


27


In this scenario pthreas API are used. Definition of Pthread APIs are included by including
pthread.h. To provide appropriate synchronization semaphore is used. Definition of semaphore
APIs and decelatarion of the semaphore type object is included with the semaphore.h. Figure 3.2
shows this files is included with the application.

Figure3. 3 Declaration of constructive functions for scenario 3

Figure 3.3 shows the deceleration of constructive function. Here in this scenario four thread
will create four different list. To fulfill this requirement each function for each thread is decelared.

Figure3. 4 Decelaration of list pointers and location pointers

Structure to pointer *myList is created which is holding starting address of the list.Here we
have four different queue. To hold these queues(hold base address) four different pointer to
structure list_head is created. As same, to hold the current location in the in four different list four
ll_node is created. Deceleration of these all object are shown by the Figure3.4.

Figure3. 5 Definition of constructive functions

Figure 3.5 shows the definition of the constructive functions. By the calling of this function,
it will create new list. The new list is pointed by the myList series of pointers. list_create() is the


28


function which created the new list and returns addresses of the new list in the form of list_head
structure. At the second line of the pointer p,q,r and s will be created to hold the current position of
the element in the lists. At the initial level the current position is set to first position by calling the
list_position_create().

Figure3. 6 Declaration of thread function and synchronization objects

Here, four producer thread will call the four different function. Decelerations of these
functions are shown by the figure 3.6.As the part of synchronization mechanism four semaphores
and mutext is used.The only reason for using semaphore is it can be taken by one thread and
released by the other thread but in the case of mutex it is not possible. The decelerations of these
objects are shown by the Figure 3.6.

Figure3. 7 Main function for application of scenario 2

Figure 3.7 shows the code for the main function. Semaphore is initialized at the starting of
the main function. To initialize the semaphore function sem_init is used with the three arguments.
The first argument shows the address of the sem_t (semaphore) type object. The second parameter


29


shows that semaphore is shared between all the threads of process. The third parameter shows the
initial value of the semaphore. Here in our case the initial value is 1 so semaphore is binary
semaphore. After initializing all synchronization tools, threads are created by locking the
semaphore So at the end of this four producer thread and one consumer thread is created by locking
four semaphore and main thread will wait for client thread to finish execution.

Figure3. 8 First producer thread

Figure 3.8 shows the first producer thread. The first thread will enter the value in to the list
named mylist. At the end of the function thread 1 will unlock semaphore named l_th, which was
taken by the main function before crating the thread. By the same time consumer thread is also
waiting to take the highest priority thread’s semaphore again. Here highest priority thread is thread
3 and semaphore associated with it is l_th3.Mutext is used to avoid the multiple threads seeks for
data at the same time.

Figure3. 9 Second producer thread

Figure 3.9 shows the second producer thread. The second thread will enter the value in to
the list named mylist1. At the end of the function thread 2 will unlock semaphore named l_th1,


30


which was taken by the main function before crating the thread. By the same time consumer thread
is also waiting to take the highest priority thread’s semaphore again. Here highest priority thread is
thread 3 and semaphore associated with it is l_th3.Mutext is used to avoid the multiple threads
seeks for data at the same time.

Figure3. 10 Third producer thread

Figure 3.10 shows the third producer thread. The third thread will enter the value in to the
list named mylist2. At the end of the function thread 3 will unlock semaphore named l_th2, which
was taken by the main function before crating the thread. By the same time consumer thread is
waiting to lock semaphore l_th3 which is already locked by main function .Mutex is used to avoid
the multiple threads seeks for data at the same time.

Figure3. 11 Fourth producer thread

Figure 3.10 shows the fourth producer thread. The fourth thread will enter the value in to the
list named mylist3. At the end of the function thread 4 will unlock semaphore named l_th3, which
was taken by the main function before crating the thread. This is the highest priority thread for


31


which consumer thread is looking. By the moment when thread four will release the semaphore,
consumer thread will be active. The consumer thread will read the data from the highest priority
thread to lowest priority thread.

Figure3. 12 Consumer thread with highest priority queue

Figure 3.12 shows the top half of the consumer thread. As per the requirement, when the
producer and consumer thread comes to gather the consumer should get highest priority to access
queue. TO obtain this one instance of the structure sched_param is created. Here two APIs are used
named pthread_setschedparam() and pthread_setschedprio().The first API is used to change the
scheduling policy of the current thread. For consumer thread scheduling policy is set to
FIFO.Basically FIFO is the scheduling policy in which the thread which comes first in ready state
will get a chance to execute. No other thread can preempt the current execution of the thread. In our
case due to FIFO scheduling no other thread can preempt the consumer thread.
On the other side, the requirement is when consumer and producer come to at the same
time consumer should get the highest priority to in such situation. To fulfill this requirement here
priority of the client thread is set high and server thread is working with the normal priority. To
assign priority to thread pthread_setschedprio() is used with the argument of thread ID and value of
priority. In FIFO scheduling the 0 is least priority and 99 is the highest priority. After setting the
priority to the thread the consumer thread will continuously mask the size variable in every list


32


structure of the producer threads. If producer stores some data in its list, consumer will read it and
remove it.
Here first thread 3 is having the highest priority so consumer thread checks size of queue
associated with the thread 3. If the size is not equal to zero it menace that some data is available in
the queue so it has to delete as the first priority.

Figure3. 13 Continuation of consumer thread for second priority queue

If the highest priority thread is not having data in its queue then it not worth for consumer
thread to wait until highest priority thread stores data in its queue because consumer has to serve
four producer threads. To achieve this functionality if consumer thread doesn’t get the data in the
highest priority thread (myList3) then it will jump to check the data is second priority queue. Here
the second priority is assigned to the thread 2 and queue associated with thread 2 is myList2.This
mechanism is seen form the figure 3.13. In figure consumer thread is checking the myList2 if some
data is available in myList2 then consumer will print it and delete it.

Figure3. 14 Continuation of consumer thread for third priority queue

If the first two priority threads are not having data in their queue then it not worth for
consumer thread to wait until any of two threads stores data in its queue because consumer has to
serve four producer threads. To achieve this functionality if consumer thread doesn’t get the data in
the first two priority thread (myList3 and myList2) then it will jump to check the data is third


33


priority queue. Here the third priority is assigned to the thread 1 and queue associated with thread 1
is myList1.This mechanism is seen form the figure 3.14. In figure consumer thread is checking the
myList3.If some data is available in myList1 then consumer will print it and delete it.

Figure3. 15 Continuation of consumer thread for last priority queue

If the first three priority threads are not having data in their queue then it not worth for
consumer thread to wait until any of three threads stores data in its queue because consumer has to
serve four producer threads. To achieve this functionality if consumer thread doesn’t get the data in
the first three priority thread (myList3,myList2 and myList1) then it will jump to check the data is
last priority queue. Here the last priority is assigned to the thread 0 and queue associated with
thread 3 is myList3.This mechanism is seen form the figure 3.14. In figure consumer thread is
checking the myList3.If some data is available in myList3 then consumer will print it and delete it.

3.3 Test cases and testing results for scenario 3
3.3.1 Test cases
In this section test cases are designed for the four producer one consumer system. The table
below describes the test cases which are to be performed. There are to validate the functionality of
system with corner cases of input.
Table 3. 1 Test cases for higher priority consumer thread

TCN

Test cases

Test

Expected Result

Data

TC_1

Consumer
should acquire
higher priorities
and it should
run first

NA

Output
Obtained

At the starting of the program the
consumer should run and shows the list is
empty


Yes

34


If the consumer
and producer
runs to tries to
access resource
together then
consumer
should get
access first
Consumer
should not wait
for the higher
priority thread
to enter value

Any

By the time of access of shared resources,
the consumer should get higher priority

Yes

Any

If higher priority thread doesn’t enters the
value then consumer should check in the
other lower priority queue

Yes

TC_4 If two values is
enters by the
any of producer
thread then
consumer
respond it

Any

There are some cases when consumer is
busy in printing some values and by the
same time some thread enter two values in
tits queue. In such condition consumer
should Read and delete both values

Yes

TC_2

TC_3

3.3.2 Documentation of the results
Figure 3.16 shows the results of the test cases which is developed in above section. It can be
seen from the image that consumer thread is resounding for all the present threads queues which is
having values. Attached callouts will give the batter understanding about resuts.

Consumer thread
is executing first as
it has highest
priority

Thread 3 is having higher
priority but it doesn’t have value
in its queue. But present thread 1
has a value in its queue. Rather
than waiting consumer thread
giving service to the lower
priority thread.

Figure3. 16 Results of test cases


Thread 3 has a
highest priority
to be read. But it
is coming but
consumer is not
waiting for
thread three.

35


3.5 Conclusion
Consumer Producer Application with extended priority concept is explained with a
sequence diagram showing the relation between FOUR static priority producer threads, one
consumer thread, four queues and their synchronization mechanism, parallelized program using P
thread and test cases to test the functionality.


36


CHAPTER 4
4.1 Module Learning Outcomes
This module helped to expertise on parallel programming for multi-core architectures,
learned multi-core processors along with their performance quantification & usability techniques,
single & multi-core optimization techniques and development of multi-threaded parallel
programming are explained clearly. Virtualization and partitioning techniques are explained
detailed along with specific challenges and solutions. Got an idea about parallel programming of
multi-core processors with appropriate case studies using OpenMP and pthreads.
After this module I am able to analyze multi-core architectures, optimization process of
single- & multi-core processors and parallel programming for multi-core processors proficiently
using OpenMP library and GCC complier for programming multi-core processors and applying
parallel programming concepts for developing applications on multi-core processors was well
taught using lab programs has become a efficient way of learning pthreads, OpenMP and various
synchronization techniques for eliminating deadlock situation.

4.2 Conclusion
In Chapter1 By critically comparing throughput and latency for available arbitration
schemes of multi core, Round Robin is better when compare to the other arbitration, since it is a
fair-queuing packet scheduler which has good fairness and delay properties, and low complexity,
even there are lot more negative aspects in round robin hope replacements would be done in future.
In chapter2 Multi-core hardware is clearly increasing software complexity by driving the
need for multi-threaded applications. Based on the rising rate of multi-core hardware adoption in
both enterprise and consumer devices, the challenge of creating multi-threaded applications is here
to stay for software developers.
In view of this, a producer-consumer application is successfully created using pthread APIs.
Both threads shares same linked list. The synchronization is provided by mutex. The test cases are
developed by critically analyzing the application code and assignment requirements. All the test
cases are successfully tested.
In chapter3 Consumer Producer Application with extended priority concept is explained
with a sequence diagram showing the relation between FOUR static priority producer threads, one
consumer thread, four queues and their synchronization mechanism, parallelized program using P
thread and test cases to test the functionality.


37


References
[1] http://www.coverity.com/library/pdf/coverity_multi-threaded_whitepaper.pdf
[2] www.irit.fr/publis/TRACES/12619_etfa2011.pd
[3] www.cs.fsu.edu/research/reports/TR-100401.pd
[4] paper.ijcsns.org/07_book/200809/20080936.pdf
[5] www.sti.uniurb.it/bogliolo/e-publ/KLUWERjdaes03.pdf


38


Appendix-1


39


Appendix-2


40

Assignment 9

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Assignment 9

Similaire à Assignment 9 (20)

Dernier

Dernier (20)

Assignment 9