The Multithread Library

11

This chapter describes how to use the multithread library, libthread, in Pascal programs. This multithread library interface is defined in the SunOS 5.x Reference Manual. The SunOS 5.x Guide to Multi-Thread Programming describes concurrent programming concepts and practices suitable to Solaris threads.

The features described in this chapter apply only to the Solaris 2.x environment.

This chapter contains the following sections:

Multithread Environment for Pascal

page 237

Introduction to Multithreading

page 238

Parallel Matrix-Multiplication Example

page 242

Debugging Multithreaded Pascal Programs

page 249

Sample dbx Session

page 250

Multithread Environment for Pascal	page 237
Introduction to Multithreading	page 238
Parallel Matrix-Multiplication Example	page 242
Debugging Multithreaded Pascal Programs	page 249
Sample dbx Session	page 250

Multithread Environment for Pascal

Compiling and binding a multithreaded Pascal program requires the following:

Pascal 4.2 compiler (pc)
Standard Solaris linker (ld)
Multithread library (libthread)
Multithreading-safe libraries (libc, libpc, possibly others as needed)
Pascal program including the two #include <file> statements for thread_p.h and for synch_p.h, part of Pascal 4.2 providing libthread library binding for Pascal (as math_p.h does for the mathematical library)

Compiling Multithreaded Programs

Compile multithreaded programs by using the -mt flag to ensure that the -D_REENTRANT option is passed to cpp and that the -lpc_mt and -lthread options are passed to ld.

Introduction to Multithreading

A thread is a sequence of execution steps performed by a program. A program without multithreading operates on a single thread of control. Control in a single-threaded program is always synchronous, operating sometimes from the program and sometimes from the operating system. Figure 11-1 schematically depicts the basic elements of multithreading as discussed in this chapter.

Figure 11-1 Thread Interface Architecture

The multithreading capabilities of Solaris threads allow many threads of control to share a single UNIX process and use its address space, as shown in Figure 11-1. A multithreaded UNIX process is not a single thread of control in itself, but it contains one or more threads of control. A single thread can start other threads, and each thread can execute independently and asynchronously. Multithreading can:

Take advantage of multiprocessing hardware
Enhance application responsiveness
Enhance application throughput

Parallel Matrix-Multiplication Example

Computationally intensive applications can benefit from using all available processors. Matrix multiplication, for example, is an operation that could be speeded up by using multiple processors, as shown in the following program, MatrixMultiply.

When a matrix-multiply operation is called, it acquires a mutex lock to ensure that only one matrix-multiply operation is in progress. The MatrixMultiply program uses mutex locks that are statically initialized to zero. A requesting thread checks whether its worker threads have been created. If its worker threads have not been created, the requesting thread creates them.

In the MatrixMultiply program, once its worker threads are created, the requesting thread sets up a to_do counter for the work and then signals the worker procedure via a conditional variable. Each worker procedure picks off a row and column from the input matrix, then the next worker procedure gets the next item, and so on.

The matrix-multiply operation then releases the mutex lock so computation of the vector product can proceed in parallel, with each processor running one thread at a time.

When the vector product results are ready, the worker procedure reacquires the mutex lock and updates the not_done counter of work not yet completed. At the end of the matrix-multiply operation, the worker procedure that completes the last bit of work then signals the requesting thread. Each iteration computes the result of one entry in the result matrix. In some cases, the amount of computation could be insufficient to justify the overhead of synchronizing multiple worker procedures. In such cases, more work per synchronization should be given to each worker. For example, each worker could compute an entire row of the output matrix before synchronization.

program MatrixMultiply:

#include <thread_p.h>

#include <synch_p.h>

const

THR = 2;{ level of parallelism }

DIM = 400;{ array dimension }

type

arr_elem_t = double;

arr_t = array[0..DIM-1, 0..DIM-1] of arr_elem_t;

arr_p = ^arr_t;

{continued on next page}

program MatrixMultiply: #include <thread_p.h> #include <synch_p.h> const THR = 2;{ level of parallelism } DIM = 400;{ array dimension } type arr_elem_t = double; arr_t = array[0..DIM-1, 0..DIM-1] of arr_elem_t; arr_p = ^arr_t; {continued on next page}

{continued from previous page}

{ totality data for parallel work }

work_data_t = record

{ synchronization primitives }

lock: mutex_t;

start_cond, done_cond: cond_t;

{ counters of work }

to_do, not_done: integer;

{ level of parallelism }

workers: integer;

{ shared data }

m1: arr_p;

m2: arr_p;

m3: arr_p;

row, col: integer;

end;

var

only_one_matrix_multiply_is_in_progress: mutex_t;

work_data: work_data_t;

m1: arr_t;

m2: arr_t;

m3: arr_t;

i, j: integer;

x: integer;

elems_sum: arr_elem_t;

start, stop: integer;

procedure worker; forward;

procedure matmul(m1, m2, m3: arr_p); { requesting thread }

var

i: integer;

cr: integer;

begin

cr := mutex_lock(addr(only_one_matrix_multiply_is_in_progress));

cr := mutex_lock(addr(work_data.lock));

{ create worker threads }

if (work_data.workers = 0) then begin

for i := 0 to THR - 1 do

cr := thr_create(nil, 0, addr(worker), nil,

THR_NEW_LWP, nil);

work_data.workers := THR;

end;

{continued on next page}

{continued from previous page} { totality data for parallel work } work_data_t = record { synchronization primitives } lock: mutex_t; start_cond, done_cond: cond_t; { counters of work } to_do, not_done: integer; { level of parallelism } workers: integer; { shared data } m1: arr_p; m2: arr_p; m3: arr_p; row, col: integer; end; var only_one_matrix_multiply_is_in_progress: mutex_t; work_data: work_data_t; m1: arr_t; m2: arr_t; m3: arr_t; i, j: integer; x: integer; elems_sum: arr_elem_t; start, stop: integer; procedure worker; forward; procedure matmul(m1, m2, m3: arr_p); { requesting thread } var i: integer; cr: integer; begin cr := mutex_lock(addr(only_one_matrix_multiply_is_in_progress)); cr := mutex_lock(addr(work_data.lock)); { create worker threads } if (work_data.workers = 0) then begin for i := 0 to THR - 1 do cr := thr_create(nil, 0, addr(worker), nil, THR_NEW_LWP, nil); work_data.workers := THR; end; {continued on next page}

{continued from previous page}

{ initialization data for parallel work }

work_data.m1 := m1;

work_data.m2 := m2;

work_data.m3 := m3;

work_data.row := 0;

work_data.col := 0;

work_data.to_do := DIM*DIM;

work_data.not_done := DIM*DIM;

{ signals the worker to start via a condition variable }

cr := cond_broadcast(addr(work_data.start_cond));

{ waiting for signal from worker that completes

the latest bit of work }

while (work_data.not_done > 0) do

cr := cond_wait(addr(work_data.done_cond),

addr(work_data.lock));

cr := mutex_unlock(addr(work_data.lock));

cr := mutex_unlock(addr(only_one_matrix_multiply_is_in_progress));

end;

procedure worker;\x7f

var

wm1, wm2, wm3: arr_p;

row, col: integer;

i: integer;

result: arr_elem_t;

cr: integer;

begin

while true do begin

{ critical region 1 }

cr := mutex_lock(addr(work_data.lock));

while work_data.to_do = 0 do { wait for signal to start }

cr := cond_wait(addr(work_data.start_cond),

addr(work_data.lock));

work_data.to_do := work_data.to_do - 1;

wm1 := work_data.m1;

wm2 := work_data.m2;

wm3 := work_data.m3;

row := work_data.row;

{continued on next page}

{continued from previous page} { initialization data for parallel work } work_data.m1 := m1; work_data.m2 := m2; work_data.m3 := m3; work_data.row := 0; work_data.col := 0; work_data.to_do := DIMDIM; work_data.not_done := DIMDIM; { signals the worker to start via a condition variable } cr := cond_broadcast(addr(work_data.start_cond)); { waiting for signal from worker that completes the latest bit of work } while (work_data.not_done > 0) do cr := cond_wait(addr(work_data.done_cond), addr(work_data.lock)); cr := mutex_unlock(addr(work_data.lock)); cr := mutex_unlock(addr(only_one_matrix_multiply_is_in_progress)); end; procedure worker;\x7f var wm1, wm2, wm3: arr_p; row, col: integer; i: integer; result: arr_elem_t; cr: integer; begin while true do begin { critical region 1 } cr := mutex_lock(addr(work_data.lock)); while work_data.to_do = 0 do { wait for signal to start } cr := cond_wait(addr(work_data.start_cond), addr(work_data.lock)); work_data.to_do := work_data.to_do - 1; wm1 := work_data.m1; wm2 := work_data.m2; wm3 := work_data.m3; row := work_data.row; {continued on next page}

{concluding from previous page}

col := work_data.col;

work_data.col := work_data.col + 1;

if work_data.col = DIM then begin

work_data.col := 0;

work_data.row := work_data.row + 1;

if work_data.row = DIM then

work_data.row := 0;

end;

cr := mutex_unlock(addr(work_data.lock));

{ end of critical region 1 }

{ computing the vector product in parallel }

result := 0;

for i := 0 to DIM - 1 do

result := result + wm1^[row,i] * wm2^[i,col];

wm3^[row,col] := result;

{ end of computing the vector product in parallel }

{ critical region 2 }

cr := mutex_lock(addr(work_data.lock));

work_data.not_done := work_data.not_done - 1;

if work_data.not_done = 0 then {work is complete}

cr := cond_signal(addr(work_data.done_cond));

cr := mutex_unlock(addr(work_data.lock));

{ end of critical region 2 }

end;

end;

begin

writeln('Matrix size: ', DIM :1);

writeln('Number of worker threads: ', THR :1);

for i := 0 to DIM - 1 do

for j := 0 to DIM - 1 do begin

m1[i,j] := random(x);

m2[i,j] := random(x);

end;

start := wallclock;

matmul(addr(m1), addr(m2), addr(m3));

stop := wallclock;

writeln('Matrix multiplication time: ', stop - start :1, ' seconds.');

end.

{concluding from previous page} col := work_data.col; work_data.col := work_data.col + 1; if work_data.col = DIM then begin work_data.col := 0; work_data.row := work_data.row + 1; if work_data.row = DIM then work_data.row := 0; end; cr := mutex_unlock(addr(work_data.lock)); { end of critical region 1 } { computing the vector product in parallel } result := 0; for i := 0 to DIM - 1 do result := result + wm1^[row,i] * wm2^[i,col]; wm3^[row,col] := result; { end of computing the vector product in parallel } { critical region 2 } cr := mutex_lock(addr(work_data.lock)); work_data.not_done := work_data.not_done - 1; if work_data.not_done = 0 then {work is complete} cr := cond_signal(addr(work_data.done_cond)); cr := mutex_unlock(addr(work_data.lock)); { end of critical region 2 } end; end; begin writeln('Matrix size: ', DIM :1); writeln('Number of worker threads: ', THR :1); for i := 0 to DIM - 1 do for j := 0 to DIM - 1 do begin m1[i,j] := random(x); m2[i,j] := random(x); end; start := wallclock; matmul(addr(m1), addr(m2), addr(m3)); stop := wallclock; writeln('Matrix multiplication time: ', stop - start :1, ' seconds.'); end.

Improving Time Efficiency With Two Threads

To save time, the preceding program could be run with a different number of threads and matrices of different sizes. The following examples show the results of testing two different thread/matrix combinations on a SPARCstation 10 with two 50MHz TMS390Z55 CPUs.

Results of running MatrixMultiply.p with a single thread:

> matr_mult_1

Matrix size: 400

Number of worker threads: 1

Matrix multiplication time: 68 seconds.

> `matr_mult_1` Matrix size: 400 Number of worker threads: 1 Matrix multiplication time: 68 seconds.

Using two threads to run MatrixMultiply.p cuts the time for the matrix multiplication almost in half:

> matr_mult_2

Matrix size: 400

Number of worker threads: 2

Matrix multiplication time: 35 seconds.

> `matr_mult_2` Matrix size: 400 Number of worker threads: 2 Matrix multiplication time: 35 seconds.

Use of Many Threads

The following example Pascal program, many_threads.p, is based on a similar C example in the Threads Primer (A Guide to Multithreaded Programming) by Bill Lewis and Daniel J. Berg. This example shows how to easily create many threads of execution in a Solaris environment.

Because of the lightweight nature of threads, it is possible to create thousands of threads. After its creation, each thread is blocked by waiting on a mutex variable. (This prevents the thread from continuing execution independently.) After the main thread has created all other threads, it waits for user input and then tries to join all the threads.

program many_threads;

#include <thread_p.h>

#include <synch_p.h>

const

THR_COUNT = 100; { the number of threads }

var

lock: mutex_t;

cr: integer;

i: integer;

procedure thr_sub;

var

thread_id: thread_t;

begin

{ try to lock the mutex variable - since the main thread

has locked the mutex before the threads were created, this

thread will block until the main thread unlock the mutex }

cr := mutex_lock(addr(lock));

thread_id := thr_self;

writeln('Thread ', thread_id:1, ' is exiting...');

{unlock the mutex variable, to allow another thread to proceed}

cr := mutex_unlock(addr(lock));

end;

begin

writeln('Creating ', THR_COUNT:1, ' threads...');

{ lock the mutex variable - this mutex is being used to keep

all the other threads created from proceeding }

cr := mutex_lock(addr(lock));

{ creates all the threads }

for i := 0 to THR_COUNT - 1 do

cr := thr_create(nil, 2048, addr(thr_sub), nil, 0, nil);

writeln(i+1:1, ' threads have been created and are running!');

writeln('Press <Return> to join all the threads...');

{ wait till user presses return, then join all the threads }

readln;

writeln('Joining ', THR_COUNT:1, ' threads...');

{now unlock the mutex variable, to let all the threads proceed}

cr := mutex_unlock(addr(lock));

{ join the threads }

for i := 0 to THR_COUNT - 1 do

cr := thr_join(0, nil, nil);

end.

program many_threads; #include <thread_p.h> #include <synch_p.h> const THR_COUNT = 100; { the number of threads } var lock: mutex_t; cr: integer; i: integer; procedure thr_sub; var thread_id: thread_t; begin { try to lock the mutex variable - since the main thread has locked the mutex before the threads were created, this thread will block until the main thread unlock the mutex } cr := mutex_lock(addr(lock)); thread_id := thr_self; writeln('Thread ', thread_id:1, ' is exiting...'); {unlock the mutex variable, to allow another thread to proceed} cr := mutex_unlock(addr(lock)); end; begin writeln('Creating ', THR_COUNT:1, ' threads...'); { lock the mutex variable - this mutex is being used to keep all the other threads created from proceeding } cr := mutex_lock(addr(lock)); { creates all the threads } for i := 0 to THR_COUNT - 1 do cr := thr_create(nil, 2048, addr(thr_sub), nil, 0, nil); writeln(i+1:1, ' threads have been created and are running!'); writeln('Press <Return> to join all the threads...'); { wait till user presses return, then join all the threads } readln; writeln('Joining ', THR_COUNT:1, ' threads...'); {now unlock the mutex variable, to let all the threads proceed} cr := mutex_unlock(addr(lock)); { join the threads } for i := 0 to THR_COUNT - 1 do cr := thr_join(0, nil, nil); end.

Debugging Multithreaded Pascal Programs

Using the dbx utility you can debug and execute programs written in Pascal. Both dbx and the SPARCworks Debugger support debugging multithreaded programs. Table 11-1 lists dbx options that support multithreaded programs.

Table 11-1 dbx Options That Support Multithreaded Programs

dbx Option
Explanation

cont [[at "prog_file":line] [sig] [id]]

Continue execution of program "prog_file" at line number line with signal number sig. The id, if present, specifies which thread ID (tid) or LWP ID (lid) to continue. If id is absent, the default is for all tids and lids continue. (For more information, refer to the dbx command discussion of using continue for loop control.)

lwp

Display the current LWP.

lwp lid

Switch to the LWP identified lid lid.

lwps

List all LWPs in the current process.

next... tid

Step the given thread. When a function call is skipped over, all LWPs are implicitly resumed for the duration of that function call. Non-active threads cannot be stepped.

next... lid

Step the given LWP. Will not implicitly resume all LWPs when skipping a function.

step... tid

Step the given thread. When a function call is skipped over, all LWPs are implicitly resumed for the duration of that function call. Non-active threads cannot be stepped.

step... lid

Step the given LWP; will not implicitly resume all LWPs when skipping a function.

thread

Display current thread.

thread tid

Switch to thread tid. In the following variations, the lack of the optional tid means the current thread.

thread -info [tid]

Display everything known about the current [or given] thread.

thread -locks [tid]

Display all locks held by the current [or given] thread.

thread -suspend [tid]

Put the current [or given] thread into suspended state.

thread -continue [tid]

Unsuspend the current [or given] thread.

thread -hide [tid]

"Hide" the current [or given] thread; will not show in the threads listing.

thread -unhide [tid]

"Unhide" the current [or given] thread.

thread -unhide all

"Unhide" all threads.

threads

Display a list of all known threads.

threads -all

Display threads normally not printed (zombies).

threads -mode all|filter

Control whether threads by default lists all threads or filters them.

threads -mode

Display a list of the current mode of each thread.

Table 11-1 `dbx` Options That Support Multithreaded Programs
dbx Option	Explanation
`cont` `[[at` `"`prog_file`":`line`]` `[`sig`]` `[`id`]]`	Continue execution of program "prog_file" at line number line with signal number sig. The id, if present, specifies which thread ID (tid) or LWP ID (lid) to continue. If id is absent, the default is for all tids and lids continue. (For more information, refer to the `dbx` command discussion of using `continue` for loop control.)
`lwp`	Display the current LWP.
`lwp` lid	Switch to the LWP identified lid lid.
`lwps`	List all LWPs in the current process.
`next...` tid	Step the given thread. When a function call is skipped over, all LWPs are implicitly resumed for the duration of that function call. Non-active threads cannot be stepped.
`next...` lid	Step the given LWP. Will not implicitly resume all LWPs when skipping a function.
`step...` tid	Step the given thread. When a function call is skipped over, all LWPs are implicitly resumed for the duration of that function call. Non-active threads cannot be stepped.
`step...` lid	Step the given LWP; will not implicitly resume all LWPs when skipping a function.
`thread`	Display current thread.
`thread` tid	Switch to thread tid. In the following variations, the lack of the optional tid means the current thread.
`thread -info [`tid`]`	Display everything known about the current [or given] thread.
`thread -locks [`tid`]`	Display all locks held by the current [or given] thread.
`thread -suspend [`tid`]`	Put the current [or given] thread into suspended state.
`thread -continue [`tid`]`	Unsuspend the current [or given] thread.
`thread -hide [`tid`]`	"Hide" the current [or given] thread; will not show in the `threads` listing.
`thread -unhide [`tid`]`	"Unhide" the current [or given] thread.
`thread -unhide all`	"Unhide" all threads.
`threads`	Display a list of all known threads.
`threads -all`	Display threads normally not printed (zombies).
`threads -mode all\|filter`	Control whether `threads` by default lists all threads or filters them.
`threads -mode`	Display a list of the current mode of each thread.

Sample `dbx` Session

The following examples use the program many_threads.p.

1. Compile - To use dbx or debugger, compile and link with -g flag, as shown in the following command line:

> pc many_threads.p -o many_threads -mt -g

> `pc many_threads.p -o many_threads -mt -g`

2. Start - To start dbx, enter dbx and the name of the executable file, as shown in the following command line and screen output display:

> dbx many_threads

Reading symbolic information for many_threads

Reading symbolic information for rtld /usr/lib/ld.so.1

Reading symbolic information for libthread.so.1

Reading symbolic information for libc.so.1

Reading symbolic information for libdl.so.1

detected a multithreaded program

> `dbx many_threads` Reading symbolic information for many_threads Reading symbolic information for rtld /usr/lib/ld.so.1 Reading symbolic information for libthread.so.1 Reading symbolic information for libc.so.1 Reading symbolic information for libdl.so.1 detected a multithreaded program

3. Set breakpoints - To set a breakpoint, enter a stop at "file":N command, where file is the program file name and N is a program string number in that program. The following two commands, for example, set two breakpoints in the many_threads.p program:

> stop at "many_threads.p":46

> stop at "many_threads.p":58

> `stop at "many_threads.p":46` > `stop at "many_threads.p":58`

4. Run program - To run the executable file, enter the run command as shown in the following command line and screen output display:

> run

Running: many_threads

(process id 12452)

t@1 (l@1) stopped in program at line 46 in file "many_threads.p"

46 writeln(i+1:1, ' threads have been created and are running!');

> `run` Running: many_threads (process id 12452) t@1 (l@1) stopped in program at line 46 in file "many_threads.p" 46 writeln(i+1:1, ' threads have been created and are running!');

5. Print threads - To print a list of all known threads, enter the threads command as shown in the following command line and screen output display:

> threads

t@1 a l@1 ?() breakpoint in program()

t@2 ?() sleep on (unknown) in _swtch()

t@3 b l@2 ?() running in __sigwait()

t@4 thr_sub() runnable in _setpsr()

t@5 thr_sub() runnable in _setpsr()

...

t@102 thr_sub() runnable in _setpsr()

t@103 thr_sub() runnable in _setpsr()

> `threads` t@1 a l@1 ?() breakpoint in program() t@2 ?() sleep on (unknown) in _swtch() t@3 b l@2 ?() running in __sigwait() t@4 thr_sub() runnable in _setpsr() t@5 thr_sub() runnable in _setpsr() ... t@102 thr_sub() runnable in _setpsr() t@103 thr_sub() runnable in _setpsr()

6. Continue program - To continue program execution after the stop at "many_threads":46 command, enter the cont command as shown in the following command line and screen output display:

> cont

continuing all LWPs

Creating 100 threads...

100 threads have been created and are running!

Press <Return> to join all the threads...

Joining 100 threads...

t@1 (l@1) stopped in program at line 58 in file "many_threads.p"

58 cr := mutex_unlock(addr(lock));

> `cont` continuing all LWPs Creating 100 threads... 100 threads have been created and are running! Press <Return> to join all the threads... Joining 100 threads... t@1 (l@1) stopped in program at line 58 in file "many_threads.p" 58 cr := mutex_unlock(addr(lock));

7. List LWPs - To list all LWPs in the current process, enter the lwps command as shown in the following command line and screen output display:

> lwps

l@1 breakpoint in program()

l@2 running in __sigwait()

l@3 running in _lwp_sema_wait()\x7f

l@4 running in ___lwp_cond_wait()

> `lwps` l@1 breakpoint in program() l@2 running in __sigwait() l@3 running in _lwp_sema_wait()\x7f l@4 running in ___lwp_cond_wait()

8. Continue program - To continue program execution after the stop at "many_threads":58 command, enter the cont command as shown in the following command line and screen output display:

> cont

continuing all LWPs

Thread 4 is exiting...

Thread 5 is exiting...

...

Thread 102 is exiting...

Thread 103 is exiting...

execution completed, exit code is 0

> `cont` continuing all LWPs Thread 4 is exiting... Thread 5 is exiting... ... Thread 102 is exiting... Thread 103 is exiting... execution completed, exit code is 0

9. Quit - Exit dbx:

> quit

The Multithread Library

11

Multithread Environment for Pascal

Compiling Multithreaded Programs

Introduction to Multithreading

Figure 11-1 Thread Interface Architecture

Thread Resources

Thread Creation Functions

Lightweight Processes

Process Control

Synchronization

Conditional Variables

Semaphores

Readers/Writer Lock

Parallel Matrix-Multiplication Example

Improving Time Efficiency With Two Threads

Use of Many Threads

Debugging Multithreaded Pascal Programs

Table 11-1 dbx Options That Support Multithreaded Programs

Sample dbx Session

Table 11-1 `dbx` Options That Support Multithreaded Programs

Sample `dbx` Session