CSC-325 Analysis of Algorithms

 

Chapter 7 Parallel Algorithms

 

 

7.1.1 Computer System Categories

 

          SISD (Single Instruction Single Data)

                   One processor that can carry out one program instruction at a time.

                   Most PC computers are like this as well as many larger computers.

 


                    Instruction                             result

 

 

 

 

          SIMD (Single Instruction Multiple Data

                   Several processors

                   All carrying out the same instruction on different sets of data

 

                   Example: Add the vectors (1,2) and (3,4) to get (4,6)

 


                             1                                    4

                             3

 

 

                             2                                    6

 

                                      4

 

 

                   MISD (Multiple Instruction Single Data)

                             Several processors

                             Doing different instructions on the same piece of data

 

                             Consider finding out if a number X is prime:

                                      If X is not prime then it will have a factor <= X1/2

                                      Suppose you have N CPUs where N >= X1/2

                                      Have each divide X by 2, 3, 4, 5 … X1/2 to see if there is a divisor

 

         

                   MIMD (Multiple Instruction Multiple Data)

                             Many processors running different programs or different parts

                             of programs and then combining results

 

 

          7.1.2 Parallel Architectures

 

          How do processors communicate and how is memory connected to the processors?

 

          Loosely vs. Tightly Coupled Machines

                    Loosely:

                             Each processor has its own memory

                             Communication is done via separate “lines”

                             Each processor (computer) can run on its own

                             We often call this type of setup a “computer cluster”

                   Tightly:

                             Processors share centralized memory

                             Communication is done by writing results to memory where

                             another processor can retrieve it

 

          Processor communication

                   Processors can communicate just like computer networks do

                   Consider some layouts of networks:

                             Every processor connected to every other processor

                             A linear network

                             A ring network

                             A mesh network

                   What are some of the advantages and disadvantages?

 

An exercise:

          A problem you are working on needs a series of numbers that are the summations of numbers from a set. More specifically, for the set {s1, s2, s3, … sN} you need the sums s1+s2, s1+s2+s3, …, s1+s2+s3+ … + sN. Design a method that you could use to solve this problem using parallelism.

 

Another exercise:

          In a tightly coupled system where several processors vie for the same memory, there may be problems of contention for data and validity of data. What are the problems? Can you think of any ways to solve them?

 

 

7.3 Simple Parallel Operations

 

          First some notation:

                   P(i) = the I-th processor

                   P = the number of processors

                   N = the number of input data items

                   M(j) = the j-th memory location

                   PS = start parallel operations

                   PE = end parallel operations

 

          Finding the Maximum Value in a List

                   Assume that the list is stored in locations M(1) thru M(N)

                   Assume that P = N/2

 

                   Count = N/2

         For i = 1 to (lg(count) + 1) do

             PS

             For j = 1 to Count do

                 P(j) reads M(2j) into X and M(2j+1) into Y

                 If X > Y

                     P(j) writes X into M(j)

                 Else

                      P(j) writes Y into M(j)

                 End If

             End For j

                             PE

             Count = Count / 2

         End For I

 

          Let’s try to trace this algorithm with a set of actual data.

 

         

          Parallel Searching

 

                   Suppose we P = N, i.e. we have the same number of processors as

elements in the list we are searching.

 

PS

          For j = 1 to N do

         P(j) reads X from M(j) and target from M(N+1)

         If X = target then

             Write j to M(N+2)

         End If

    End For

PE

 

         Of course there is no guarantee that we will have P = N. What if P < N?

 

                   PS

             For j = 1 to P do

                 P(j) performs a sequential binary search on

                 M((j-1)*(N/P)+1) thru M(j*(N/P)) and writes

                 the location where X is found into M(N+2)

                             End For

         PE

 

         What happens when we have a list of 2000 items and 25 processors?

                   What is the runtime of this algorithm is P = N?

                   What is the cost (# of comparisons) of this algorithm when P = N?

                   What is the runtime and cost of this algorithm if P = 1?

 

                   What is the optimal number of processors?

 

 

                   Matrix Multiplication:

Consider our old problem of multiplying an I x J matrix with a J x K matrix to obtain an I x K matrix. Can you envision a parallel algorithm to do this? Try it on a simple set of matrices below. How many processors would you have? What would each do and when?

 

1

5

4

6

7

2

 

 

8

2

3

5

1

9