CSCIUA0480 Lab 3 Solved

30.00 $

Category:

Description

Rate this product

Before you start:

  • To calculate the time of each program in this lab use the Linux command time.
  • After you login to your CIMS account, you need to ssh to one of the following: cuda1, cuda2, cuda3, or cuda4
  • The source code, containing both device and host code, has extension .cu
  • You compile with nvcccu
  • Don’t forget to #include <cuda.h>
  • A very useful API is cudaGetDeviceProperties() check it up.

 

  1. Assume a reduction algorithm that finds the maximum of an array of 8192 integers. You will need to write a host function that fills the array with random integers between 1 and 100000.
    1. Write the sequential version of the program in C. Note that the sequential version will scan the array sequentially from start to end. Call it c.
    2. Write a CUDA version of the program that does not take thread divergence into account. Call it cu.
    3. Update the version in B to take thread divergence into account. Call it cu.
    4. Update the program in C to make use of shared memory to reduce global memory bandwidth. Call it cu.

Draw a bar graph that compares the execution time of each of the above 4 versions. That is, x-axis contains the 4 versions (for each one report the real, user, and sys) and the y-axis contains the time. So, we expect to see 12 bars (4 versions and 3 timing each).

 

  1. Repeat problem 1 with an array of 65536 elements. Adjust the file names based on the new number.

 

  1. What can we conclude from the results of problems 1 and 2 regarding the optimizations and the problem size?
  • Lab3-tw7puk.zip