Bglrsession4

Prof NB Venkateswarlu
B.Tech(SVU), M.Tech(IIT-K), Ph.D(BITS, Pilani), PDF(U of Leeds,UK)
ISTE Visiting Fellow 2010-11
AITAM, Tekkali

 Parallelism: degree to which a multiprocessor application
achieves parallel execution
 Concurrency: Maximum parallelism an application can achieve
with unlimited processors
 System Concurrency: kernel recognizes multiple threads of
control in a program
 User Concurrency: User space threads (coroutines) to provide a
natural programming model for concurrent applications.
Concurrency not supported by system.

 Process: encompasses
◦ Unit of dispatching: process is an execution path
through one or more programs set of threads
(computational entities)
 execution may be interleaved with other processes
◦ Unit of resource ownership: process is allocated a
virtual address space to hold the process image
 Thread: Dynamic object representing an
execution path and computational state.
◦ threads have their own computational state: PC, stack,
user registers and private data
◦ Remaining resources are shared amongst threads in a
process

5
A word processor with three threads

 Threads separate the notion of execution from
the Process abstraction
 Effectiveness of parallel computing depends on
the performance of the primitives used to
express and control parallelism
 Useful for expressing the intrinsic concurrency
of a program regardless of resulting
performance
 Three types:
◦ User threads, N:1 model
◦ kernel threads, 1:1 model
◦ Light Weight Processes (LWP), M:N model

 User level threads - user libraries implementation
◦ Benefits: no kernel modifications, flexible and low cost
◦ Drawbacks: thread may block entire process, no parallelism
 Kernel level threads - kernel directly supports multiple threads of control in
a process.
◦ Benefits: scheduling/synchronization coordination, less overhead than
process, suitable for parallel application
◦ Drawbacks: more expensive than user-level threads, generality leads to
greater overhead
 Light-Weight Processes (LWP), also known as virtual processors. Kernel
supported user threads
◦ LWP bound to kernel thread: a kernel thread may not be bound to an LWP
◦ LWP is scheduled by kernel
◦ User threads scheduled by library onto LWPs
◦ Multiple LWPs per process

 Primary states:
◦ Running, Ready and Blocked.
 Operations to change state:
◦ Spawn: new thread provided register context and stack
pointer.
◦ Block: event wait, save user registers, PC and stack
pointer
◦ Unblock: moved to ready state
◦ Finish: deallocate register context and stacks.

 User threads := Many-to-One
 kernel threads := One-to-One
 Mixed user and kernel := Many-to-Many
P PP PP P
Many-to-One One-to-One Many-to-Many

 fork and exec
◦ should fork duplicate one, some or all threads
 Cancellation – issues with freeing resources and inconsistent state
◦ asynchronous cancellation – target is immediately canceled
◦ deferred cancellation – target checks periodically. check at cancellation points
 Signals: generation, posting and delivery
◦ per thread signal masks but process shared disposition
 Signal delivery:
◦ to which thread should a signal be delivered
◦ specifically designated thread (signal thread)
◦ synchronous signals should go to thread causing the signal
◦ what about asynchronous signals? Solaris: deliver to a special thread which forward
to first user created thread that has not blocked the signal.
 Bounding the number of threads created in a dynamic environment
◦ use thread pools
 Al threads share same address space:
◦ use of thread specific data

 Thread operations in user space:
◦ create, destroy, synch, context switch
 kernel threads implement a virtual processor
 Course grain in kernel - preemptive scheduling
 Communication between kernel and threads library
◦ shared data structures.
◦ Software interrupts (user upcalls or signals). Example, for
scheduling decisions and preemption warnings.
◦ Kernel scheduler interface - allows dissimilar thread packages to
coordinate.

 An activation:
◦ serves as execution context for running thread
◦ notifies thread of kernel events (upcall)
◦ space for kernel to save processor context of current user thread
when stopped by kernel
◦ Library schedules user threads on activations.
◦ space for kernel to save processor context of current user thread
when stopped by kernel
◦ upall performed when one of the following occurs:
 user thread performs blocking system call
 blocked thread belonging to process, then its library is
notified allowing it to either schedule a new thread or
resume the preempted thread.
 kernel is responsible for processor allocation => preemption by
kernel.
 Thread package responsible for scheduling threads on available
processors (activations)

 a POSIX standard (IEEE 1003.1c) API for thread
creation and synchronization.
 API specifies behavior of the thread library,
implementation is up to development of the
library.
 Common in UNIX operating systems.

 Supports:
◦ user threads (uthreads) via libthread and libpthread
◦ LWPs, acts as a virtual CPU for user threads
◦ kernel threads (kthread), every LWP is associated with
one kthread, however a kthread may not have an LWP
 interrupts as threads
 Fundamental scheduling/dispatching object
 all kthreads share same virtual address space
(the kernels) - cheap context switch
 System threads - example STREAMS, callout
 kthread_t, /usr/include/sys/thread.h
◦ scheduling info, pointers for scheduler or sleep
queues, pointer to klwp_t and proc_t

 Bound to 1 kthread
 LWP specific fields from proc are kept in
klwp_t (/usr/include/sys/klwp.h)
◦ user-level registers, system call params, resource
usage, pointer to kthread_t and proc_t
◦ klwp_t can be swapped
 LWP non-swappable info kept in kthread_t
 All LWPs in a process share:
◦ signal handlers
 Each may have its own
◦ signal mask
◦ alternate stack for signal handling
 No global name space for LWPs

 Implemented in user libraries
 library provides synchronization and scheduling
facilities
 threads may be bound to LWPs
 unbound threads compete for available LWPs
 Manage thread specific info
◦ thread id, saved register state, user stack, signal mask,
priority*, thread local storage
 Solaris provides two libraries: libthread and
libpthread.
 Try man thread or man pthreads

proc_t
kthread_t
klwp_t
p_tlist
t_procp
t_forw
lwp_thread
t_lwp
lwp_procp
(non-swappable)

L LL
P P
Process 1
user
kernel
hardware
L
P
Process 2
...
Int kthr
...
...

Stop
Stop
Sleep
Dispatch
Stop
Wakeup
Continue
Preempt
Stopped
Runnable
Active
Sleeping

Timeslice
or Preempt
Wakeup
Stop
Blocking
System
Call
WakeupDispatch
Runnable
Running
Blocked
Stopped
Continue
Stop

 One system wide clock kthread
 pool of 9 partially initialized kthreads per CPU
for interrupts
 interrupt thread can block
 interrupted thread is pinned to the CPU

 Divided into Traps (synchronous) and
interrupts (asynchronous)
 each thread has its own signal mask, global
set of signal handlers
 Each LWP can specify alternate stack
 fork replicates all LWPs
 fork1 only the invoking LWP/thread

 Two abstractions:
◦ Task - static object, address space and system resources called port
rights.
◦ Thread - fundamental execution unit and runs in context of a task.
 Zero or more threads per task,
 kernel schedulable
 kernel stack
 computational state
 Processor sets - available processors divided into non-intersecting sets.
◦ permits dedicating processor sets to one or more tasks
 C-Threads
◦ Coroutine-based - multiplex user threads onto a single-threaded task
◦ Thread-based - one-to-one mapping from c-threads to Mach threads.
Default.
◦ Task-based - One Mach Task per c-thread.

 Address problem of excessive kernel stack memory
requirements
 process model versus interrupt model
◦ one per process kernel stack versus a per thread kernel stack
 Thread is first responsible for saving any required state (the
thread structure allows up to 28 bytes)
 indicate a function to be invoked when unblocked (the
continuation function)
 Advantage: stack can be transferred between threads
eliminating copy overhead.

 Design driven by need to support a variety of
OS environments
 NT process implemented as an object
 executable process contains >= 1 thread
 process and thread objects have built in
synchronization capabilitiesS
 Support for kernel (system) threads
 Threads are scheduled by the kernel and thus
are similar to UNIX threads bound to an LWP
(kernel thread)
 fibers are threads which are not scheduled by
the kernel and thus are similar to unbound
user threads.

 Process = process context + code, data, and
stack
shared libraries
run-time heap
0
read/write data
Program context:
Data registers
Condition codes
Stack pointer (SP)
Program counter (PC)
Kernel context:
VM structures
Descriptor table
brk pointer
Code, data, and stack
read-only code/data
stack
SP
PC
brk
Process context

 Process = thread + code, data, and kernel
context
shared libraries
run-time heap
0
read/write dataThread context:
Data registers
Condition codes
Stack pointer (SP)
Program counter (PC)
Code and Data
read-only code/data
stack
SP
PC
brk
Thread (main thread)
Kernel context:
VM structures
Descriptor table
brk pointer

 Multiple threads can be associated with a
process
◦ Each thread has its own logical control flow
(sequence of PC values)
◦ Each thread shares the same code, data, and kernel
context
◦ Each thread has its own thread id (TID)shared libraries
run-time heap
0
read/write dataThread 1 context:
Data registers
Condition codes
SP1
PC1
Shared code and data
read-only code/data
stack 1
Thread 1 (main thread)
Kernel context:
VM structures
Descriptor table
brk pointer
Thread 2 context:
Data registers
Condition codes
SP2
PC2
stack 2
Thread 2 (peer thread)

Thread
Control
Block
User
Stack
User
Stack
Kernel
Stack
Kernel
Stack
User
Address
Space
User
Address
Space
Process
Control
Block
Process
Control
Block
Thread
Single-Threaded
Process Model
Multithreaded Process Model
Thread
Control
Block
User
Stack
Kernel
Stack
Thread
Thread
Control
Block
User
Stack
Kernel
Stack
Thread

Address space view in multithreaded system

 Threads associated with a process form a
pool of peers.
◦ Unlike processes which form a tree hierarchy
P0
P1
sh sh sh
foo
bar
T1
Process hierarchyThreads associated with process foo
T2
T4
T5 T3
shared code, data
and kernel context

 Two threads run concurrently (are concurrent)
if their logical flows overlap in time.
 Otherwise, they are sequential.
 Examples:
◦ Concurrent: A & B, A&C
◦ Sequential: B & C
Time
Thread A Thread B Thread C

 How threads and processes are similar
◦ Each has its own logical control flow.
◦ Each can run concurrently.
◦ Each is context switched.
 How threads and processes are different
◦ Threads share code and data, processes (typically)
do not.
◦ Threads are somewhat less expensive than
processes.
 Process control (creating and reaping) is twice as
expensive as thread control.
 Linux/Pentium III numbers:
 ~20K cycles to create and reap a process.
 ~10K cycles to create and reap a thread.

35
 Manager/worker
◦ Manager thread handles I/O and assigns work to worker
threads
◦ Worker threads may be created dynamically, or allocated
from a thread-pool
 Peer
◦ Like manager worker, but manager participates in the
work
 Pipeline
◦ Each thread handles a different stage of an assembly line
◦ Threads hand work off to each other in a producer-
consumer relationship

36
 Pros
◦ Overlap I/O with computation!
◦ Cheaper context switches
◦ Better mapping to shared memory multiprocessors
 Cons
◦ Potential thread interactions
◦ Complexity of debugging
◦ Complexity of multi-threaded programming
◦ Backwards compatibility with existing code

 Pthreads: Standard interface for ~60 functions
that manipulate threads from C programs.
◦ Creating and reaping threads.
 pthread_create
 pthread_join
◦ Determining your thread ID
 pthread_self
◦ Terminating threads
 pthread_cancel
 pthread_exit
 exit [terminates all threads] , ret [terminates current
thread]
◦ Synchronizing access to shared variables
 pthread_mutex_init
 pthread_mutex_[un]lock
 pthread_cond_init
 pthread_cond_[timed]wait

int pthread_create(pthread_t * thread, const pthread_attr_t
* attr, void * (*start_routine)(void *), void *arg);
Function call: pthread_createArguments:
•thread - returns the thread id. (unsigned long int defined in bits/pthreadtypes.h)
•attr - Set to NULL if default thread attributes are used. (else define members of
the struct pthread_attr_t defined in bits/pthreadtypes.h) Attributes include:
•detached state (joinable? Default: PTHREAD_CREATE_JOINABLE. Other option:
PTHREAD_CREATE_DETACHED)
•scheduling policy (real-time?
PTHREAD_INHERIT_SCHED,PTHREAD_EXPLICIT_SCHED,SCHED_OTHER)
•scheduling parameter
•inheritsched attribute (Default: PTHREAD_EXPLICIT_SCHED Inherit from parent
thread: PTHREAD_INHERIT_SCHED)
•scope (Kernel threads: PTHREAD_SCOPE_SYSTEM User threads:
PTHREAD_SCOPE_PROCESS Pick one or the other not both.)
•guard size
•stack address (See unistd.h and bits/posix_opt.h
_POSIX_THREAD_ATTR_STACKADDR)
•stack size (default minimum PTHREAD_STACK_SIZE set in pthread.h),
•void * (*start_routine) - pointer to the function to be threaded. Function has a single
argument: pointer to void.
•*arg - pointer to argument of function. To pass multiple arguments, send a
pointer to a structure.

39
 pthread_exit (status)
◦ Terminates the thread and returns “status” to any joining
thread
 pthread_join (threadid,status)
◦ Blocks the calling thread until thread specified by
“threadid” terminates
◦ Return status from pthread_exit is passed in “status”
◦ One way of synchronizing between threads
 pthread_yield ()
◦ Thread gives up the CPU and enters the run queue

/*
* hello.c - Pthreads "hello, world" program
*/
#include "csapp.h"
void *thread(void *vargp);
int main() {
pthread_t tid;
Pthread_create(&tid, NULL, thread, NULL);
Pthread_join(tid, NULL);
exit(0);
}
/* thread routine */
void *thread(void *vargp) {
printf("Hello, world!n");
return NULL;
}
Thread attributes
(usually NULL)
Thread arguments
(void *p)
return value
(void **p)

main thread
peer thread
return NULL;main thread waits for
peer thread to terminate
exit()
terminates
main thread and
any peer threads
call Pthread_create()
call Pthread_join()
Pthread_join()
returns
printf()
(peer thread
terminates)
Pthread_create() returns

Once created, threads are peers, and may create other threads. There is no
implied hierarchy or dependency between threads.

Thread Attributes:
•By default, a thread is created with certain attributes. Some of these
attributes can be changed by the programmer via the thread attribute object.
•pthread_attr_init and pthread_attr_destroy are used to initialize/destroy the thread attribute object.
•Other routines are then used to query/set specific attributes in the thread attribute object.
•Some of these attributes will be discussed later.

Terminating Threads:
•There are several ways in which a Pthread may be terminated:
•The thread returns from its starting routine
(the main routine for the initial thread).
•The thread makes a call to the pthread_exit subroutine (covered below).
•The thread is canceled by another thread via the pthread_cancel routine (not covered here
•The entire process is terminated due to a call to either the exec or exit subroutines.
•pthread_exit is used to explicitly exit a thread. Typically, the pthread_exit() routine is called after a
•thread has completed its work and is no longer required to exist.
•If main() finishes before the threads it has created, and exits with pthread_exit(),
the other threads will continue to execute. Otherwise, they will be automatically
terminated when main() finishes.
•The programmer may optionally specify a termination status, which is stored
as a void pointer for any thread that may join the calling thread.
•Cleanup: the pthread_exit() routine does not close files; any files opened inside
the thread will remain open after the thread is terminated.
•Discussion: In subroutines that execute to completion normally,
you can often dispense with calling pthread_exit() - unless, of course, you want to
pass a return code back. However, in main(), there is a definite problem if main()
completes before the threads it spawned. If you don't call pthread_exit() explicitly,
when main() completes, the process (and all threads) will be terminated.
By callingpthread_exit() in main(), the process and all of its threads will be kept
alive even though all of the code in main() has been executed.

Another simple thread creation
program
#include<stdio.h>
#include<pthread.h>
void * HI(void *x)
{
printf("Hellon");
}
int main()
{
pthread_t tid;
pthread_create(&tid, NULL, HI, NULL);
return 0;
}

A Simple Example which displays
returned values of create function.
#include<stdio.h>
#include<pthread.h>
void * HI(void *x)
{
printf("Hellon");
}
int main(){
pthread_t tid1, tid2;
int ret1, ret2;
ret1=pthread_create(&tid1, NULL, HI, NULL);
printf("Return Values=%d %dn", ret1, ret2);
return 0;
}

Program which prints thread
and process IDs
#include<stdio.h>
#include<pthread.h>
void * HI(void *x){
printf("Hellon");
printf("Process ID and Thread IDs are:%d %un",getpid(),
(unsigned)pthread_self());
sleep(10);
}
int main(){
int ret1, ret2;
return 0;
}

Passing a string to thread
function.
#include<stdio.h>
#include<pthread.h>
void * HI(void *x){
char *a=(char *) x;
printf("Hello %sn", a);
}
int main()
{
char *msg="Dr.AIT";
pthread_t tid;
pthread_create(&tid, NULL, HI, (void*)msg);
return 0;
}

Passing an integer to a thread
#include<stdio.h>
#include<string.h>
#include<pthread.h>
void * HI(void *x){
int n= * ((int*) x);
int i;
for(i=0;i<n;i++)
printf("Hellon");
}
int main( int argc, char *argv[]){
char *msg="Dr.AIT";
int nlp=atoi(argv[1]);
pthread_t tid;
pthread_create(&tid, NULL, HI, (void*) &nlp);
return 0;
}

Multiple Threads
#include<stdio.h>
#include<string.h>
#include<pthread.h>
void * HI(void *x){
int n= * ((int*) x);
int i;
for(i=0;i<n;i++){
printf("Hello %un", (unsigned)pthread_self());
sleep(1);
}
}
char *msg="Dr.AIT";
int nlp, nlp1;
nlp=atoi(argv[1]); nlp1=atoi(argv[2]);
pthread_t tid, tid1;
pthread_create(&tid, NULL, HI, (void*) &nlp);
pthread_create(&tid1, NULL, HI, (void*) &nlp1);
pthread_join(tid, NULL);
pthread_join(tid1, NULL);
/* Explain what happens if we remove join functions */
return 0;
}

Multiple threads with racing
#include<stdio.h>
#include<string.h>
#include<pthread.h>
int count=0;
void * HI(void *x){
while(1){
printf("%dn", count++);
}
}
pthread_create(&tid1, NULL, HI, NULL);
return 0;
}

Use of nanosleep method
#include<stdio.h>
#include<pthread.h>
#include<time.h>
void * HI(void *x){
struct timespec A={0,1000};
printf("Hellon");
nanosleep(&A, NULL);
printf("Hellon");
}
int main(){
pthread_t tid;
return 0;
}

 Question: Which variables in a threaded C
program are shared variables?
◦ The answer is not as simple as “global variables are
shared” and “stack variables are private”.
 Requires answers to the following questions:
◦ What is the memory model for threads?
◦ How are variables mapped to memory instances?
◦ How many threads reference each of these
instances?

 Conceptual model:
◦ Each thread runs in the context of a process.
◦ Each thread has its own separate thread context.
 Thread ID, stack, stack pointer, program counter, condition codes, and
general purpose registers.
◦ All threads share the remaining process context.
 Code, data, heap, and shared library segments of the process virtual
address space.
 Open files and installed handlers
 Operationally, this model is not strictly enforced:
◦ While register values are truly separate and protected....
◦ Any thread can read and write the stack of any other thread.
 Mismatch between the conceptual and operation model is a
source of confusion and errors.

char **ptr; /* global */
int main()
{
int i;
pthread_t tid;
char *msgs[N] = {
"Hello from foo",
"Hello from bar"
};
ptr = msgs;
for (i = 0; i < 2; i++)
Pthread_create(&tid,
NULL,
thread,
(void *)i);
Pthread_exit(NULL);
}
void *thread(void *vargp)
{
int myid = (int)vargp;
static int svar = 0;
printf("[%d]: %s (svar=%d)n",
myid, ptr[myid], ++svar);
}
Peer threads access main thread’s stack
indirectly through global ptr variable

char **ptr; /* global */
int main()
{
int i;
pthread_t tid;
char *msgs[N] = {
"Hello from foo",
"Hello from bar"
};
ptr = msgs;
for (i = 0; i < 2; i++)
Pthread_create(&tid,
NULL,
thread,
(void *)i);
Pthread_exit(NULL);
}
{
int myid = (int)vargp;
static int svar = 0;
printf("[%d]: %s (svar=%d)n",
myid, ptr[myid], ++svar);
}
Global var: 1 instance (ptr [data])
Local static var: 1 instance (svar [data])
Local automatic vars: 1 instance (i.m, msgs.m )
Local automatic var: 2 instances (
myid.p0[peer thread 0’s stack],
myid.p1[peer thread 1’s stack]
)

 Which variables are shared?
Variable Referenced by Referenced by Referenced by
instance main thread? peer thread 0? peer thread 1?
ptr yes yes yes
svar no yes yes
i.m yes no no
msgs.m yes yes yes
myid.p0 no yes no
myid.p1 no no yes
Answer: A variable x is shared iff multiple threads
reference at least one instance of x. Thus:
 ptr, svar, and msgs are shared.
 i and myid are NOT shared.

unsigned int cnt = 0; /* shared */
int main() {
Pthread_create(&tid1, NULL,
count, NULL);
Pthread_create(&tid2, NULL,
count, NULL);
Pthread_join(tid1, NULL);
Pthread_join(tid2, NULL);
if (cnt != (unsigned)NITERS*2)
printf("BOOM! cnt=%dn",
cnt);
else
printf("OK cnt=%dn",
cnt);
}
void *count(void *arg) {
int i;
for (i=0; i<NITERS; i++)
cnt++;
return NULL;
}
linux> ./badcnt
BOOM! cnt=198841183
linux> ./badcnt
BOOM! cnt=198261801
linux> ./badcnt
BOOM! cnt=198269672
cnt should be
equal to 200,000,000.
What went wrong?!

.L9:
movl -4(%ebp),%eax
cmpl $99999999,%eax
jle .L12
jmp .L10
.L12:
movl cnt,%eax # Load
leal 1(%eax),%edx #
Update
movl %edx,cnt #
Store
.L11:
movl -4(%ebp),%eax
leal 1(%eax),%edx
movl %edx,-4(%ebp)
jmp .L9
.L10:
Corresponding asm code
(gcc -O0 -fforce-mem)
for (i=0; i<NITERS;
i++)
cnt++;
C code for counter loop
Head (Hi)
Tail (Ti)
Load cnt (Li)
Update cnt (Ui)
Store cnt (Si)

 Key idea: In general, any sequentially consistent
interleaving is possible, but some are incorrect!
◦ Ii denotes that thread i executes instruction I
◦ %eaxi is the contents of %eax in thread i’s context
H1
L1
U1
S1
H2
L2
U2
S2
T2
T1
1
1
1
1
2
2
2
2
2
1
-
0
1
1
-
-
-
-
-
1
0
0
0
1
1
1
1
2
2
2
i (thread) instri cnt%eax1
OK
-
-
-
-
-
1
2
2
2
-
%eax2

 Incorrect ordering: two threads increment the
counter, but the result is 1 instead of 2.
H1
L1
U1
H2
L2
S1
T1
U2
S2
T2
1
1
1
2
2
1
1
2
2
2
-
0
1
-
-
1
1
-
-
-
0
0
0
0
0
1
1
1
1
1
i (thread) instri cnt%eax1
-
-
-
-
0
-
-
1
1
1
%eax2
Oops!

63
Free
Lock
Thread A
Thread D
Thread CThread B

64
Free
Lock
Thread A
Thread D
Thread CThread B
Lock

65
Set
Lock
Thread A
Thread D
Thread CThread B
Lock

66
Set
Lock
Thread A
Thread D
Thread CThread B
Lock

67
Set
Lock
Thread A
Thread D
Thread CThread B

68
Set
Lock
Thread A
Thread D
Thread CThread B
Lock

69
Set
Lock
Thread A
Thread D
Thread CThread B
Lock

70
Set
Lock
Thread A
Thread D
Thread CThread B
Lock Lock
Lock

71
Set
Lock
Thread A
Thread D
Thread CThread B
Lock Lock
Lock

72
Set
Lock
Thread A
Thread D
Thread CThread B
Lock Lock
Lock
Unlock

73
Set
Lock
Thread A
Thread D
Thread CThread B
Lock Lock
Lock
Unlock

74
Free
Lock
Thread A
Thread D
Thread CThread B
Lock Lock
Lock

75
Free
Lock
Thread A
Thread D
Thread CThread B
Lock Lock
Lock

76
Set
Lock
Thread A
Thread D
Thread CThread B
Lock Lock
Lock

77
Set
Lock
Thread A
Thread D
Thread CThread B
Lock Lock
Lock

78
Set
Lock
Thread A
Thread D
Thread CThread B
Lock
Lock

#include<stdio.h>
#include<string.h>
#include<pthread.h>
int count=0;
pthread_mutex_t mutex1=PTHREAD_MUTEX_INITIALIZER;
void * HI(void *x){
while(1){
pthread_mutex_lock(&mutex1);
printf("%dn", count++);
pthread_mutex_unlock(&mutex1);
}
}
pthread_create(&tid1, NULL, HI, NULL);
return 0;
}

 How about this ordering?
H1
L1
H2
L2
U2
S2
U1
S1
T1
T2
1
1
2
2
2
2
1
1
1
2
i (thread) instri cnt%eax1 %eax2
We can clarify our understanding of concurrent
execution with the help of the progress graph

A progress graph depicts
the discrete execution
state space of concurrent
threads.
Each axis corresponds to
the sequential order of
instructions in a thread.
Each point corresponds to
a possible execution state
(Inst1, Inst2).
E.g., (L1, S2) denotes state
where thread 1 has
completed L1 and thread
2 has completed S2.H1 L1 U1 S1 T1
H2
L2
U2
S2
T2
Thread 1
Thread 2
(L1, S2)

A trajectory is a sequence
of legal state transitions
that describes one possible
concurrent execution of
the threads.
Example:
H1, L1, U1, H2, L2,
S1, T1, U2, S2, T2
H1 L1 U1 S1 T1
H2
L2
U2
S2
T2
Thread 1
Thread 2

L, U, and S form a
critical section with
respect to the shared
variable cnt.
Instructions in critical
sections (wrt to some
shared variable) should
not be interleaved.
Sets of states where such
interleaving occurs
form unsafe regions.
H1 L1 U1 S1 T1
H2
L2
U2
S2
T2
Thread 1
Thread 2
Unsafe region
critical section wrt cnt
critical
sectio
n wrt
cnt

Def: A trajectory is safe
iff it doesn’t touch any
part of an unsafe region.
Claim: A trajectory is
correct (wrt cnt) iff it is
safe.
H1 L1 U1 S1 T1
H2
L2
U2
S2
T2
Thread 1
Thread 2
Unsafe region Unsafe
trajectory
Safe trajectory
critical section wrt cnt
critical
sectio
n wrt
cnt

 Question: How can we guarantee a safe trajectory?
◦ We must synchronize the threads so that they never enter
an unsafe state.
 Classic solution: Dijkstra's P and V operations on
semaphores.
◦ semaphore: non-negative integer synchronization variable.
 P(s): [ while (s == 0) wait(); s--; ]
 Dutch for "Proberen" (test)
 V(s): [ s++; ]
 Dutch for "Verhogen" (increment)
◦ OS guarantees that operations between brackets [ ] are
executed indivisibly.
 Only one P or V operation at a time can modify s.
 When while loop in P terminates, only that P can decrement s.
 Semaphore invariant: (s >= 0)

 Here is how we would use P and V operations
to synchronize the threads that update cnt.
/* Semaphore s is initially 1 */
/* Thread routine */
void *count(void *arg)
{
int i;
for (i=0; i<NITERS; i++) {
P(s);
cnt++;
V(s);
}
return NULL;
}

Provide mutually
exclusive access to
shared variable by
surrounding critical
section with P and V
operations on semaphore
s (initially set to 1).
Semaphore invariant
creates a forbidden
region
that encloses unsafe
region and is never
touched by any trajectory.
H1 P(s) V(s) T1
Thread 1
Thread 2
L1 U1 S1
H2
P(s)
V(s)
T2
L2
U2
S2
Unsafe region
Forbidden region
1 1 0 0 0 0 1 1
1 1 0 0 0 0 1 1
0 0 -1 -1 -1 -1 0 0
0 0
-1 -1 -1 -1
0 0
0 0
-1 -1 -1 -1
0 0
0 0
-1 -1 -1 -1
0 0
1 1 0 0 0 0 1 1
1 1 0 0 0 0 1 1
Initially
s = 1

/* Initialize semaphore sem to value */
/* pshared=0 if thread, pshared=1 if process */
void Sem_init(sem_t *sem, int pshared, unsigned int value) {
if (sem_init(sem, pshared, value) < 0)
unix_error("Sem_init");
}
/* P operation on semaphore sem */
void P(sem_t *sem) {
if (sem_wait(sem))
unix_error("P");
}
/* V operation on semaphore sem */
void V(sem_t *sem) {
if (sem_post(sem))
unix_error("V");
}

/* goodcnt.c - properly sync’d
counter program */
#include "csapp.h"
#define NITERS 10000000
unsigned int cnt; /* counter */
sem_t sem; /* semaphore */
int main() {
Sem_init(&sem, 0, 1); /* sem=1 */
/* create 2 threads and wait */
...
if (cnt != (unsigned)NITERS*2)
printf("BOOM! cnt=%dn", cnt);
else
printf("OK cnt=%dn", cnt);
exit(0);
}
void *count(void *arg)
{
int i;
P(&sem);
cnt++;
V(&sem);
}
return NULL;
}

 Common synchronization pattern:
◦ Producer waits for slot, inserts item in buffer, and “signals” consumer.
◦ Consumer waits for item, removes it from buffer, and “signals”
producer.
 “signals” in this context has nothing to do with Unix signals
 Examples
◦ Multimedia processing:
 Producer creates MPEG video frames, consumer renders the frames
◦ Event-driven graphical user interfaces
 Producer detects mouse clicks, mouse movements, and keyboard hits and
inserts corresponding events in buffer.
 Consumer retrieves events from buffer and paints the display.
producer
thread
shared
buffer
consumer
thread

/* buf1.c - producer-consumer
on 1-element buffer */
#include “csapp.h”
#define NITERS 5
void *producer(void *arg);
void *consumer(void *arg);
struct {
int buf; /* shared var */
sem_t full; /* sems */
sem_t empty;
} shared;
int main() {
pthread_t tid_producer;
pthread_t tid_consumer;
/* initialize the semaphores */
Sem_init(&shared.empty, 0, 1);
Sem_init(&shared.full, 0, 0);
/* create threads and wait */
Pthread_create(&tid_producer, NULL
producer, NULL);
Pthread_create(&tid_consumer, NULL
consumer, NULL);
Pthread_join(tid_producer, NULL);
Pthread_join(tid_consumer, NULL);
exit(0);
}

/* producer thread */
void *producer(void *arg) {
int i, item;
/* produce item */
item = i;
printf("produced %dn",
item);
/* write item to buf */
P(&shared.empty);
shared.buf = item;
V(&shared.full);
}
return NULL;
}
/* consumer thread */
void *consumer(void *arg) {
int i, item;
/* read item from buf */
P(&shared.full);
item = shared.buf;
V(&shared.empty);
/* consume item */
printf("consumed %dn",
item);
}
return NULL;
}
Initially: empty = 1, full = 0.

 Functions called from a thread must be
thread-safe.
 We identify four (non-disjoint) classes of
thread-unsafe functions:
◦ Class 1: Failing to protect shared variables.
◦ Class 2: Relying on persistent state across
invocations.
◦ Class 3: Returning a pointer to a static variable.
◦ Class 4: Calling thread-unsafe functions.

 Class 1: Failing to protect shared variables.
◦ Fix: Use P and V semaphore operations.
◦ Issue: Synchronization operations will slow down
code.
◦ Example: goodcnt.c

 Class 2: Relying on persistent state across
multiple function invocations.
◦ Random number generator relies on static state
◦ Fix: Rewrite function so that caller passes in all
necessary state.
/* rand - return pseudo-random integer on
0..32767 */
int rand(void)
{
static unsigned int next = 1;
next = next*1103515245 + 12345;
return (unsigned int)(next/65536) % 32768;
}
/* srand - set seed for rand() */
void srand(unsigned int seed)
{
next = seed;
}

 Class 3: Returning a
ptr to a static
variable.
 Fixes:
◦ 1. Rewrite code so
caller passes pointer
to struct.
 Issue: Requires
changes in caller and
callee.
◦ 2. Lock-and-copy
 Issue: Requires only
simple changes in
caller (and none in
callee)
 However, caller must
free memory.
hostp = Malloc(...));
gethostbyname_r(name, hostp);
struct hostent
*gethostbyname(char name)
{
static struct hostent h;
<contact DNS and fill in h>
return &h;
}
struct hostent
*gethostbyname_ts(char *p)
{
struct hostent *q = Malloc(...);
P(&mutex); /* lock */
p = gethostbyname(name);
*q = *p; /* copy */
V(&mutex);
return q;
}

 Class 4: Calling thread-unsafe functions.
◦ Calling one thread-unsafe function makes an entire
function thread-unsafe.
◦ Fix: Modify the function so it calls only thread-safe
functions

 A function is reentrant iff it accesses NO shared variables when called from multiple threads.
◦ Reentrant functions are a proper subset of the set of thread-safe functions.
◦ NOTE: The fixes to Class 2 and 3 thread-unsafe functions require modifying the function to make it reentrant.
Reentrant
functions
All functions
Thread-unsafe
functions
Thread-safe
functions

 All functions in the Standard C Library (at the
back of your K&R text) are thread-safe.
◦ Examples: malloc, free, printf, scanf
 Most Unix system calls are thread-safe, with
a few exceptions:
Thread-unsafe function Class Reentrant version
asctime 3 asctime_r
ctime 3 ctime_r
gethostbyaddr 3 gethostbyaddr_r
gethostbyname 3 gethostbyname_r
inet_ntoa 3 (none)
localtime 3 localtime_r
rand 2 rand_r

 A race occurs when the correctness of the
program depends on one thread reaching point x
before another thread reaches point y./* a threaded program with a race */
int main() {
pthread_t tid[N];
int i;
for (i = 0; i < N; i++)
Pthread_create(&tid[i], NULL, thread,
&i);
for (i = 0; i < N; i++)
Pthread_join(tid[i], NULL);
exit(0);
}
void *thread(void *vargp) {
int myid = *((int *)vargp);
printf("Hello from thread %dn", myid);
return NULL;

deadlock
region
P(s) V(s)
V(t)
Thread 1
Thread 2
Initially, s=t=1
P(t)
P(t) V(t)
forbidden
region for s
forbidden
region for t
P(s)
V(s) deadlock
state
Locking introduces the
potential for deadlock:
waiting for a condition
that will never be true.
Any trajectory that enters
the deadlock region will
eventually reach the
deadlock state, waiting
for either s or t to
become nonzero.
Other trajectories luck out
and skirt the deadlock
region.
Unfortunate fact:
deadlock is often non-
deterministic.

10
2
 Threads can be implemented in the OS or at
user level
 User level thread implementations
◦ thread scheduler runs as user code
◦ manages thread contexts in user space
◦ OS sees only a traditional process

10
3
The thread-switching code is in user space

 Pro’s and Cons for user level multithreading
◦ Pro: Fast, low overhead
◦ Pro: Scheduling can be controlled by user process
◦ Con: error prone (stack-overruns)
◦ Con: no seamless MP support
◦ Con: How to deal with blocking system calls!
◦ Con: How to overlap I/O and computation!
 Co-routines, cooperative (non-preemptive) scheduling
 POSIX Threads use the 1:1 model where 1 thread is 1
kernel scheduled entity.
 M:N is a hybrid where M user-level threads use N kernel
scheduled threads, to be added to NPTL
 M:1 Thread libraries: GNU Pth, FSU Pthreads, MIT Pthreads,
…

10
5
The thread-switching code is in the kernel

10
6
Multiplexing user-level threads onto kernel- level
threads

10
7
 Goal – mimic functionality of kernel threads
◦ gain performance of user space threads
 The idea - kernel upcalls to user-level thread scheduling
code when it handles a blocking system call or page fault
◦ user level thread scheduler can choose to run a different
thread rather than blocking
◦ kernel upcalls when system call or page fault returns
 Kernel assigns virtual processors to each process (which
contains a user level thread scheduler)
◦ lets user level thread scheduler allocate threads to
processors
 Problem: relies on kernel (lower layer) calling procedures in
user space (higher layer)

10
8
Assumptions:
◦ Two or more threads (or processes)
◦ Each executes in (pseudo) parallel and can’t
predict exact running speeds
◦ The threads can interact via access to a shared
variable
Example:
◦ One thread writes a variable
◦ The other thread reads from the same variable
Problem:
◦ The order of READs and WRITEs can make a
difference!!!

 Don’t: rely on libraries
 Use message passing systems (PVM, MPI)
 Exploit the memory model:
◦ Dekker’s Algorithm requires sequential consistency
(SC)
◦ Other Algorithms exist for weaker memory models
 Use atomic instructions
◦ Test&set
◦ Swap
◦ Load-locked / Store-conditional
◦ Compare & Swap
◦ Fetch&Op (exotic, rarely available)

 Threads provide another mechanism for writing
concurrent programs.
 Threads are growing in popularity
◦ Somewhat cheaper than processes.
◦ Easy to share data between threads.
 However, the ease of sharing has a cost:
◦ Easy to introduce subtle synchronization errors.
◦ Tread carefully with threads!
 For more info:
◦ D. Butenhof, “Programming with Posix Threads”,
Addison-Wesley, 1997.

int main(int argc, char **argv)
{
int listenfd, *connfdp, port, clientlen;
struct sockaddr_in clientaddr;
pthread_t tid;
if (argc != 2) {
fprintf(stderr, "usage: %s <port>n", argv[0]);
exit(0);
}
port = atoi(argv[1]);
listenfd = open_listenfd(port);
while (1) {
clientlen = sizeof(clientaddr);
connfdp = Malloc(sizeof(int));
*connfdp = Accept(listenfd, (SA *) &clientaddr, &clientl
Pthread_create(&tid, NULL, thread, connfdp);
}
}

* thread routine */
{
int connfd = *((int *)vargp);
Pthread_detach(pthread_self());
Free(vargp);
echo_r(connfd); /* reentrant version of echo() */
Close(connfd);
return NULL;
}

 Must run “detached” to avoid memory leak.
◦ At any point in time, a thread is either joinable or
detached.
◦ Joinable thread can be reaped and killed by other
threads.
 must be reaped (with pthread_join) to free memory
resources.
◦ Detached thread cannot be reaped or killed by other
threads.
 resources are automatically reaped on termination.
◦ Default state is joinable.
 use pthread_detach(pthread_self()) to make detached.
 Must be careful to avoid unintended sharing.
◦ For example, what happens if we pass the address of
connfd to the thread routine?
 Pthread_create(&tid, NULL, thread, (void
*)&connfd);
 All functions called by a thread must be thread-
safe
◦ (next lecture)

 + Easy to share data structures between
threads
◦ e.g., logging information, file cache.
 + Threads are more efficient than processes.
 --- Unintentional sharing can introduce
subtle and hard-to-reproduce errors!
◦ The ease with which data can be shared is both the
greatest strength and the greatest weakness of
threads.
◦ (next lecture)

Bglrsession4

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (19)

Similaire à Bglrsession4

Similaire à Bglrsession4 (20)

Plus de Nagasuri Bala Venkateswarlu

Plus de Nagasuri Bala Venkateswarlu (18)

Dernier

Dernier (20)

Bglrsession4