CAD2122 Project 1 Solved

30.00 $

Category:

Description

Rate this product

Image filters, like the blur filter used in lab 3, are examples of matrix processing that can be taken to the GPU. Consider two classes of filters: area filters, that considers an area around each pixel to define the final value of that pixel; and the point filters, that applies some function on the pixel value. For this project you need to implement a sequence of two filters with some parameters, that could be applied to any given image.

For the area filter, considering just the direct neighbors, we get a 3×3 grid of pixels with the original pixel at its center.

 

The final value for that pixel will be the weighted average defined by a 3×3 matrix of coefficients. The following examples of coefficients allows the implementation of several types of filters:

 

For the point filter, we pretend to gray the image using the following expression:

(r,g,b) = alpha*(c,c,c) + (1-alpha)(r,g,b),  where c = 0.3 r + 0.59 g + 0.11 b Alfa is a value in [0 .. 1] that defines the color shift to grayscale.

 

A sequential code example is provided for reference. The main objective is to achieve the best performance, particularly for big images. For convenience, this work is presented in several stages (number 5 is mandatory):

 

  1. (30%) Implement a CUDA or OpenCL solution that
    1. Parallelizes both filter operations

Suggestion: experiment with different parallel strategies. Examples: just one kernel with both filters vs. one kernel per filter.

  1. Experiment with different grid layouts and block sizes.

Study if using a local shared area can improve your kernel(s) performance (nvprof and section 8 of CUDA Best Practices Guide[1] may help you evaluate any improvement).

 

  1. (20%) Evaluate these solutions against
    1. The sequential version
    2. Using different grid arrangements and thread block sizes
    3. Using/not using shared memory

Note: ignore file I/O times but include in your timings memcopy to/from device.

               

 

 

  1. (15%) Complement your solution with the ability to overlap computation with communication by partitioning the image space to process into a given number of partitions (NP). Assigning each of such partitions to a dedicated stream. Consider the following example with NP=3.

 

 

While kernel(s) is(are) being applied over Partition 2, Partition 3 can be simultaneously uploaded to the GPU and, possibly, the result of processing Partition 1 can be simultaneously downloaded from the GPU.

 

  1. (15%) Evaluate this solution against the others.

 

  1. (20%) Write a report (max of 5 pages A4 11pt font) that presents
    1. Tested approaches and final solution
    2. Relevant implementation details
    3. Your evaluation results (include times and/or graphs to compare and justify your solution)
    4. An analysis and interpretation of these results

 

Other relevant optimizations may also be accounted in the final grade.

[1] https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#performance-metrics

  • proj1-stgwqu.zip