Previous Next Contents Index Doc Set Home


The Multithread Library

11


This chapter describes how to use the multithread library, libthread, in Pascal programs. This multithread library interface is defined in the SunOS 5.x Reference Manual. The SunOS 5.x Guide to Multi-Thread Programming describes concurrent programming concepts and practices suitable to Solaris threads.

The features described in this chapter apply only to the Solaris 2.x environment.

This chapter contains the following sections:

Multithread Environment for Pascal

page 237

Introduction to Multithreading

page 238

Parallel Matrix-Multiplication Example

page 242

Debugging Multithreaded Pascal Programs

page 249

Sample dbx Session

page 250


Multithread Environment for Pascal

Compiling and binding a multithreaded Pascal program requires the following:

Compiling Multithreaded Programs

Compile multithreaded programs by using the -mt flag to ensure that the -D_REENTRANT option is passed to cpp and that the -lpc_mt and -lthread options are passed to ld.


Introduction to Multithreading

A thread is a sequence of execution steps performed by a program. A program without multithreading operates on a single thread of control. Control in a single-threaded program is always synchronous, operating sometimes from the program and sometimes from the operating system. Figure 11-1 schematically depicts the basic elements of multithreading as discussed in this chapter.

Figure  11-1 Thread Interface Architecture
The multithreading capabilities of Solaris threads allow many threads of control to share a single UNIX process and use its address space, as shown in Figure 11-1. A multithreaded UNIX process is not a single thread of control in itself, but it contains one or more threads of control. A single thread can start other threads, and each thread can execute independently and asynchronously. Multithreading can:

Thread Resources

A thread can be created within an existing process a thousand times faster than a new process can be created. Switching between threads within a process is especially fast because it does not involve switching between address spaces. Each thread includes the following resources:

Thread Creation Functions

Solaris threads provides the following thread creation functions:

Lightweight Processes

The multithread library uses underlying threads of control called lightweight processes (LWPs) that are supported by the kernel. LWPs work like virtual CPUs that execute code or system calls. LWPs act as bridges between the user or application level and the kernel. Each traditional UNIX process contains one or more LWPs, each running one or more user threads. Programming thread creation involves creating a user context, usually without also creating an LWP.

Process Control

Your program can tell the libthread multithread support subroutine library how many threads should be able to run at the same time. You should make sure at least that number of LWPs is available.

The libthread multithread library schedules LWP usage for user threads. When a user thread blocks due to synchronization, the LWP transfers to another runnable thread through co-routine linkage and not by system call. If all LWPs block, the multithread library makes another LWP available.

Each LWP is independently dispatched by the kernel, performs independent system calls, and may incur independent page faults. On multiprocessor systems, LWPs can run in parallel, one at a time per processor. The operating system multiplexes the LWPs onto the available processors, deciding which LWP will run on which processor and when. The kernel schedules CPU resources for the LWPs according to their scheduling classes and priorities; the kernel has no information about the user threads active behind each process.

Synchronization

The operating system schedules LWP usage of the processors. Threads scheduling is influenced by the thr_setconcurrency and thr_setprio multithread library functions; see their man pages for more information.

Threads share access to the process address space, and therefore their accesses to shared data must be synchronized. Solaris threads provide a variety of synchronization facilities that use various semantics and support different styles of synchronization interaction:

Mutual exclusion locks let one thread at a time hold the lock. They are typically used to ensure that only one thread at a time executes a section of code (called a critical section) that accesses or modifies some shared data. Use mutex locks to limit access to a resource to one thread at a time. Refer to the mutex(3T) man pages for more details.

The following two Solaris threads routines are the most commonly used routines for mutual exclusion:

Conditional Variables

Use conditional variables to make threads wait until a particular condition is true. A conditional variable must be used in conjunction with a mutex lock. Refer to the condition(3T) man page for more details.

The following three Solaris threads routines are the most common routines for handling conditional variables:

Semaphores

Solaris threads provide conventional counting semaphores. A semaphore is a non-negative integer count that can be atomically incremented and decremented by special routines. Semaphores must be initialized before use. Refer to the semaphore(3T) man page for more details.

The following three Solaris thread routines are the most common routines for handling semaphores:

sema_init(3T) initializes the semaphore variable.

sema_wait(3T) blocks the thread until the semaphore becomes greater than zero, then decrements it--the P operation on Dijkstra semaphores.

sema_post(3T) increments the semaphore, potentially unblocking a waiting thread--the V operation on Dijkstra semaphores.

Readers/Writer Lock

A multiple-readers/single-writer lock gives threads simultaneous read-only access to a protected object. such a lock also gives write access to the object to a single thread while excluding any readers. This type of lock is usually used to protect data that is read more often than written. Refer to the rwlock(3T) man pages for more details.

The following routines are the most commonly used for readers/writer locks:


Parallel Matrix-Multiplication Example

Computationally intensive applications can benefit from using all available processors. Matrix multiplication, for example, is an operation that could be speeded up by using multiple processors, as shown in the following program, MatrixMultiply.

When a matrix-multiply operation is called, it acquires a mutex lock to ensure that only one matrix-multiply operation is in progress. The MatrixMultiply program uses mutex locks that are statically initialized to zero. A requesting thread checks whether its worker threads have been created. If its worker threads have not been created, the requesting thread creates them.

In the MatrixMultiply program, once its worker threads are created, the requesting thread sets up a to_do counter for the work and then signals the worker procedure via a conditional variable. Each worker procedure picks off a row and column from the input matrix, then the next worker procedure gets the next item, and so on.

The matrix-multiply operation then releases the mutex lock so computation of the vector product can proceed in parallel, with each processor running one thread at a time.

When the vector product results are ready, the worker procedure reacquires the mutex lock and updates the not_done counter of work not yet completed. At the end of the matrix-multiply operation, the worker procedure that completes the last bit of work then signals the requesting thread. Each iteration computes the result of one entry in the result matrix. In some cases, the amount of computation could be insufficient to justify the overhead of synchronizing multiple worker procedures. In such cases, more work per synchronization should be given to each worker. For example, each worker could compute an entire row of the output matrix before synchronization.

program MatrixMultiply:
#include <thread_p.h>
#include <synch_p.h>
const
	THR = 2;{ level of parallelism }
	DIM = 400;{ array dimension }
type
	arr_elem_t = double;
	
	arr_t = array[0..DIM-1, 0..DIM-1] of arr_elem_t;
	arr_p = ^arr_t;

{continued on next page}

{continued from previous page}
	{ totality data for parallel work }
	work_data_t = record
		{ synchronization primitives }
			lock: mutex_t;
		start_cond, done_cond: cond_t;
		{ counters of work }
			to_do, not_done: integer;
		{ level of parallelism }
			workers: integer;
		{ shared data }
			m1: arr_p;
			m2: arr_p;
			m3: arr_p;
			row, col: integer;
	end;
var
	only_one_matrix_multiply_is_in_progress: mutex_t;
	work_data: work_data_t;
	m1: arr_t;
	m2: arr_t;
	m3: arr_t;
	i, j: integer;
	x: integer;
	elems_sum: arr_elem_t;
	start, stop: integer;
procedure worker; forward;
procedure matmul(m1, m2, m3: arr_p); { requesting thread }
var
	i: integer;
	cr: integer;
begin
cr := mutex_lock(addr(only_one_matrix_multiply_is_in_progress));
	cr := mutex_lock(addr(work_data.lock));
	{ create worker threads }
	if (work_data.workers = 0) then begin
		for i := 0 to THR - 1 do
			cr := thr_create(nil, 0, addr(worker), nil,
				THR_NEW_LWP, nil);
		work_data.workers := THR;
	end;

{continued on next page}

{continued from previous page}

	{ initialization data for parallel work }
	work_data.m1 := m1;
	work_data.m2 := m2;
	work_data.m3 := m3;
	work_data.row := 0;
	work_data.col := 0;
	work_data.to_do := DIM*DIM;
	work_data.not_done := DIM*DIM;
	{ signals the worker to start via a condition variable }
	cr := cond_broadcast(addr(work_data.start_cond));
	
	{ waiting for signal from worker that completes
	  the latest bit of work }
	while (work_data.not_done > 0) do
		cr := cond_wait(addr(work_data.done_cond),
			addr(work_data.lock));
	cr := mutex_unlock(addr(work_data.lock));
	cr :=  
mutex_unlock(addr(only_one_matrix_multiply_is_in_progress));
end;
procedure worker;\x7f 
var
	wm1, wm2, wm3: arr_p;
	row, col: integer;
	i: integer;
	result: arr_elem_t;
	cr: integer;
begin
	while true do begin
		{ critical region 1 }
		cr := mutex_lock(addr(work_data.lock));
		while work_data.to_do = 0 do { wait for signal to start }
			cr := cond_wait(addr(work_data.start_cond),
				addr(work_data.lock));
		work_data.to_do := work_data.to_do - 1;
		wm1 := work_data.m1;
		wm2 := work_data.m2;
		wm3 := work_data.m3;
		row := work_data.row;

{continued on next page}

{concluding from previous page}

		col := work_data.col;
		work_data.col := work_data.col + 1;
		if work_data.col = DIM then begin
			work_data.col := 0;
			work_data.row := work_data.row + 1;
		if work_data.row = DIM then
			work_data.row := 0;
		end;
		cr := mutex_unlock(addr(work_data.lock));
		{ end of critical region 1 }
		{ computing the vector product in parallel }
		result := 0;
		for i := 0 to DIM - 1 do
			result := result + wm1^[row,i] * wm2^[i,col];
		wm3^[row,col] := result;
		{ end of computing the vector product in parallel }
		{ critical region 2 }
		cr := mutex_lock(addr(work_data.lock));
		work_data.not_done := work_data.not_done - 1;
		if work_data.not_done = 0 then {work is complete}
			cr := cond_signal(addr(work_data.done_cond));
			cr := mutex_unlock(addr(work_data.lock));
		{ end of critical region 2 }
	end;
end;
begin
	writeln('Matrix size: ', DIM :1);
	writeln('Number of worker threads: ', THR :1);
	for i := 0 to DIM - 1 do
		for j := 0 to DIM - 1 do begin
			m1[i,j] := random(x);
			m2[i,j] := random(x);
		end;
	start := wallclock;
	matmul(addr(m1), addr(m2), addr(m3));
	stop := wallclock;
	writeln('Matrix multiplication time: ', stop - start :1, ' 
seconds.');
end.

Improving Time Efficiency With Two Threads

To save time, the preceding program could be run with a different number of threads and matrices of different sizes. The following examples show the results of testing two different thread/matrix combinations on a SPARCstation 10 with two 50MHz TMS390Z55 CPUs.

Results of running MatrixMultiply.p with a single thread:

> matr_mult_1
Matrix size: 400
Number of worker threads: 1
Matrix multiplication time: 68 seconds.

Using two threads to run MatrixMultiply.p cuts the time for the matrix multiplication almost in half:

> matr_mult_2
Matrix size: 400
Number of worker threads: 2
Matrix multiplication time: 35 seconds.


Use of Many Threads

The following example Pascal program, many_threads.p, is based on a similar C example in the Threads Primer (A Guide to Multithreaded Programming) by Bill Lewis and Daniel J. Berg. This example shows how to easily create many threads of execution in a Solaris environment.

Because of the lightweight nature of threads, it is possible to create thousands of threads. After its creation, each thread is blocked by waiting on a mutex variable. (This prevents the thread from continuing execution independently.) After the main thread has created all other threads, it waits for user input and then tries to join all the threads.

program many_threads;
#include <thread_p.h>
#include <synch_p.h>
const
	THR_COUNT = 100;	{ the number of threads }
var
	lock: mutex_t;
	cr: integer;
	i: integer;
procedure thr_sub;
var
	thread_id: thread_t;
begin
	{ try to lock the mutex variable - since the main thread
	has locked the mutex before the threads were created, this
	thread will block until the main thread unlock the mutex }
	cr := mutex_lock(addr(lock));
	thread_id := thr_self;
	writeln('Thread ', thread_id:1, ' is exiting...');
    {unlock the mutex variable, to allow another thread to proceed}
	cr := mutex_unlock(addr(lock));
end;
begin
	writeln('Creating ', THR_COUNT:1, ' threads...');
	{ lock the mutex variable - this mutex is being used to keep
	all the other threads created from proceeding }
	cr := mutex_lock(addr(lock));
	{ creates all the threads }
	for i := 0 to THR_COUNT - 1 do
		cr := thr_create(nil, 2048, addr(thr_sub), nil,	0, nil);
	writeln(i+1:1, ' threads have been created and are running!');
	writeln('Press <Return> to join all the threads...');
	{ wait till user presses return, then join all the threads }
	readln;
	writeln('Joining ', THR_COUNT:1, ' threads...');
    {now unlock the mutex variable, to let all the threads proceed}
	cr := mutex_unlock(addr(lock));
	{ join the threads }
	for i := 0 to THR_COUNT - 1 do
		cr := thr_join(0, nil, nil);
end. 


Debugging Multithreaded Pascal Programs

Using the dbx utility you can debug and execute programs written in Pascal. Both dbx and the SPARCworks Debugger support debugging multithreaded programs. Table 11-1 lists dbx options that support multithreaded programs.

Table  11-1 dbx Options That Support Multithreaded Programs

dbx Option
Explanation

cont [[at "prog_file":line] [sig] [id]]

Continue execution of program "prog_file" at line number line with signal number sig. The id, if present, specifies which thread ID (tid) or LWP ID (lid) to continue. If id is absent, the default is for all tids and lids continue. (For more information, refer to the dbx command discussion of using continue for loop control.)

lwp

Display the current LWP.

lwp lid

Switch to the LWP identified lid lid.

lwps

List all LWPs in the current process.

next... tid

Step the given thread. When a function call is skipped over, all LWPs are implicitly resumed for the duration of that function call. Non-active threads cannot be stepped.

next... lid

Step the given LWP. Will not implicitly resume all LWPs when skipping a function.

step... tid

Step the given thread. When a function call is skipped over, all LWPs are implicitly resumed for the duration of that function call. Non-active threads cannot be stepped.

step... lid

Step the given LWP; will not implicitly resume all LWPs when skipping a function.

thread

Display current thread.

thread tid

Switch to thread tid. In the following variations, the lack of the optional tid means the current thread.

thread -info [tid]

Display everything known about the current [or given] thread.

thread -locks [tid]

Display all locks held by the current [or given] thread.

thread -suspend [tid]

Put the current [or given] thread into suspended state.

thread -continue [tid]

Unsuspend the current [or given] thread.

thread -hide [tid]

"Hide" the current [or given] thread; will not show in the threads listing.

thread -unhide [tid]

"Unhide" the current [or given] thread.

thread -unhide all

"Unhide" all threads.

threads

Display a list of all known threads.

threads -all

Display threads normally not printed (zombies).

threads -mode all|filter

Control whether threads by default lists all threads or filters them.

threads -mode

Display a list of the current mode of each thread.


Sample dbx Session

The following examples use the program many_threads.p.

1. Compile - To use dbx or debugger, compile and link with -g flag, as shown in the following command line:

> pc many_threads.p -o many_threads -mt -g

2. Start - To start dbx, enter dbx and the name of the executable file, as shown in the following command line and screen output display:

> dbx many_threads
  Reading symbolic information for many_threads
  Reading symbolic information for rtld /usr/lib/ld.so.1
  Reading symbolic information for libthread.so.1
  Reading symbolic information for libc.so.1
  Reading symbolic information for libdl.so.1
  detected a multithreaded program

3. Set breakpoints - To set a breakpoint, enter a stop at "file":N command, where file is the program file name and N is a program string number in that program. The following two commands, for example, set two breakpoints in the many_threads.p program:

> stop at "many_threads.p":46
> stop at "many_threads.p":58

4. Run program - To run the executable file, enter the run command as shown in the following command line and screen output display:

> run
  Running: many_threads
 (process id 12452)
 t@1 (l@1) stopped in program at line 46 in file "many_threads.p"
 46 writeln(i+1:1, ' threads have been created and are running!');

5. Print threads - To print a list of all known threads, enter the threads command as shown in the following command line and screen output display:

> threads
      t@1  a l@1  ?()   breakpoint              in program()
      t@2         ?()   sleep on (unknown)      in _swtch()
      t@3  b l@2  ?()   running                 in __sigwait()
      t@4         thr_sub()     runnable        in _setpsr()
      t@5         thr_sub()     runnable        in _setpsr()
...
    t@102         thr_sub()     runnable        in _setpsr()
    t@103         thr_sub()     runnable        in _setpsr()

6. Continue program - To continue program execution after the stop at "many_threads":46 command, enter the cont command as shown in the following command line and screen output display:

> cont
  continuing all LWPs
  Creating 100 threads...
  100 threads have been created and are running!
  Press <Return> to join all the threads...
  Joining 100 threads...
  t@1 (l@1) stopped in program at line 58 in file "many_threads.p"
   58   cr := mutex_unlock(addr(lock));

7. List LWPs - To list all LWPs in the current process, enter the lwps command as shown in the following command line and screen output display:

> lwps
  l@1 breakpoint       in program()
  l@2 running          in __sigwait()
  l@3 running          in _lwp_sema_wait()\x7f 
  l@4 running          in ___lwp_cond_wait()

8. Continue program - To continue program execution after the stop at "many_threads":58 command, enter the cont command as shown in the following command line and screen output display:

> cont
  continuing all LWPs
  Thread 4 is exiting...
  Thread 5 is exiting...
...
  Thread 102 is exiting...
  Thread 103 is exiting...
  execution completed, exit code is 0

9. Quit - Exit dbx:

> quit


Previous Next Contents Index Doc Set Home