Chapter 7 Parallel Algorithms
7.1.1 Computer System Categories
SISD
(Single Instruction Single Data)
One
processor that can carry out one program instruction at a time.
Most
PC computers are like this as well as many larger computers.

Instruction result
SIMD
(Single Instruction Multiple Data
Several
processors
All
carrying out the same instruction on different sets of data
Example:
Add the vectors (1,2) and (3,4) to get (4,6)

1 4
3
2 6
4
MISD (Multiple Instruction
Single Data)
Several processors
Doing different
instructions on the same piece of data
Consider finding
out if a number X is prime:
If X is
not prime then it will have a factor <= X1/2
Suppose
you have N CPUs where N >= X1/2
Have each
divide X by 2, 3, 4, 5 … X1/2 to see if there is a divisor
MIMD (Multiple Instruction
Multiple Data)
Many processors
running different programs or different parts
of programs and
then combining results
7.1.2 Parallel Architectures
How do processors communicate and how is memory connected to the processors?
Loosely
vs. Tightly Coupled Machines
Loosely:
Each processor has its own memory
Communication is done via separate “lines”
Each processor (computer) can run on
its own
We often call this type of setup a “computer
cluster”
Tightly:
Processors
share centralized memory
Communication
is done by writing results to memory where
another
processor can retrieve it
Processor
communication
Processors
can communicate just like computer networks do
Consider
some layouts of networks:
Every
processor connected to every other processor
A
linear network
A
ring network
A
mesh network
What
are some of the advantages and disadvantages?
An exercise:
A
problem you are working on needs a series of numbers that are the summations of
numbers from a set. More specifically, for the set {s1, s2, s3, … sN} you need
the sums s1+s2, s1+s2+s3, …, s1+s2+s3+ … + sN. Design a method that you could use
to solve this problem using parallelism.
Another exercise:
In
a tightly coupled system where several processors vie for the same memory,
there may be problems of contention for data and validity of data. What are the
problems? Can you think of any ways to solve them?
7.3 Simple Parallel Operations
First
some notation:
P(i)
= the I-th processor
P
= the number of processors
N
= the number of input data items
M(j)
= the j-th memory location
PS
= start parallel operations
PE
= end parallel operations
Finding
the Maximum Value in a List
Assume
that the list is stored in locations M(1) thru M(N)
Assume
that P = N/2
Count = N/2
For i = 1 to (lg(count) + 1) do
PS
For j = 1 to Count do
P(j)
reads M(2j) into X and M(2j+1) into Y
If
X > Y
P(j) writes X into M(j)
Else
P(j) writes Y into M(j)
End If
End For j
PE
Count = Count / 2
End For I
Let’s try to trace this algorithm with a set of actual data.
Parallel
Searching
Suppose
we P = N, i.e. we have the same number of processors as
elements in the
list we are searching.
For j =
1 to N do
P(j) reads X from
M(j) and target from M(N+1)
If X = target then
Write j to
M(N+2)
End If
End For
PE
Of course there
is no guarantee that we will have P = N. What if P < N?
PS
For
j = 1 to P do
P(j)
performs a sequential binary search on
M((j-1)*(N/P)+1) thru M(j*(N/P)) and writes
the
location where X is found into M(N+2)
End For
PE
What happens when
we have a list of 2000 items and 25 processors?
What is the
runtime of this algorithm is P = N?
What is the
cost (# of comparisons) of this algorithm when P = N?
What is the
runtime and cost of this algorithm if P = 1?
What is the
optimal number of processors?
Matrix
Multiplication:
Consider our old problem of multiplying an I x J matrix with a J x K matrix to obtain an I x K matrix. Can you envision a parallel algorithm to do this? Try it on a simple set of matrices below. How many processors would you have? What would each do and when?
|
1 |
5 |
|
4 |
6 |
|
7 |
2 |
|
8 |
2 |
3 |
|
5 |
1 |
9 |