CUDA C code for Addition of Two Array elements

Posted by Unknown at 05:43 | 6 comments

Problem statement: You are given two array with integer/real values, your task is to add to elements of both the array and store in another array.

Solution: The main task is to write CUDA kernel for that, writing kernel is not a big task. Let see how;

Let say we have N elements in an array which is represent by “arraySize” (here it is = 5, change accordingly). We have two Function named addWithCuda (…); for invoking kernel and allocating memory on device. d_a and d_b is the device array for storing elements and d_c is the array which stores sum of both array d_a and d_b.
We lunch arraysize number threads to add elements of array.

So, adding corresponding elements is quite easy. Just keep tracking the thread Id and we are done. We store id of thread within the block in “I” and adding both of the array element respectively.

int i = threadIdx.x;

c[i] = a[i] + b[i];

We lunched only one block with number of threads equal to arraysize.

Here is the Complete code For Adding elements of two array into another array.

#include "cuda_runtime.h"

#include "device_launch_parameters.h"

#include <stdio.h>

cudaError_t addWithCuda(int *c, const int *a, const int *b, size_t size);

__global__ void addKernel(int *c, const int *a, const int *b)

{

int i = threadIdx.x;

c[i] = a[i] + b[i];

}

int main()

{

const int arraySize = 5;

const int a[arraySize] = { 1, 2, 3, 4, 5 };

const int b[arraySize] = { 10, 20, 30, 40, 50 };

int c[arraySize] = { 0 };

// Add vectors in parallel.

cudaError_t cudaStatus = addWithCuda(c, a, b, arraySize);

if (cudaStatus != cudaSuccess) {

fprintf(stderr, "addWithCuda failed!");

return 1;

}

printf("{1,2,3,4,5} + {10,20,30,40,50} = {%d,%d,%d,%d,%d}\n",

c[0], c[1], c[2], c[3], c[4]);

// cudaThreadExit must be called before exiting in order for profiling and

// tracing tools such as Nsight and Visual Profiler to show complete traces.

cudaStatus = cudaThreadExit();

if (cudaStatus != cudaSuccess) {

fprintf(stderr, "cudaThreadExit failed!");

return 1;

}

return 0;

}

// Helper function for using CUDA to add vectors in parallel.

cudaError_t addWithCuda(int *c, const int *a, const int *b, size_t size)

{

int *dev_a = 0;

int *dev_b = 0;

int *dev_c = 0;

cudaError_t cudaStatus;

// Choose which GPU to run on, change this on a multi-GPU system.

cudaStatus = cudaSetDevice(0);

if (cudaStatus != cudaSuccess) {

fprintf(stderr, "cudaSetDevice failed! Do you have a CUDA-capable GPU installed?");

goto Error;

}

// Allocate GPU buffers for three vectors (two input, one output) .

cudaStatus = cudaMalloc((void**)&dev_c, size * sizeof(int));

if (cudaStatus != cudaSuccess) {

fprintf(stderr, "cudaMalloc failed!");

goto Error;

}

cudaStatus = cudaMalloc((void**)&dev_a, size * sizeof(int));

if (cudaStatus != cudaSuccess) {

fprintf(stderr, "cudaMalloc failed!");

goto Error;

}

cudaStatus = cudaMalloc((void**)&dev_b, size * sizeof(int));

if (cudaStatus != cudaSuccess) {

fprintf(stderr, "cudaMalloc failed!");

goto Error;

}

// Copy input vectors from host memory to GPU buffers.

cudaStatus = cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);

if (cudaStatus != cudaSuccess) {

fprintf(stderr, "cudaMemcpy failed!");

goto Error;

}

cudaStatus = cudaMemcpy(dev_b, b, size * sizeof(int), cudaMemcpyHostToDevice);

if (cudaStatus != cudaSuccess) {

fprintf(stderr, "cudaMemcpy failed!");

goto Error;

}

// Launch a kernel on the GPU with one thread for each element.

addKernel<<<1, size>>>(dev_c, dev_a, dev_b);

// cudaThreadSynchronize waits for the kernel to finish, and returns

// any errors encountered during the launch.

cudaStatus = cudaThreadSynchronize();

if (cudaStatus != cudaSuccess) {

fprintf(stderr, "cudaThreadSynchronize returned error code %d after launching addKernel!\n", cudaStatus);

goto Error;

}

// Copy output vector from GPU buffer to host memory.

cudaStatus = cudaMemcpy(c, dev_c, size * sizeof(int), cudaMemcpyDeviceToHost);

if (cudaStatus != cudaSuccess) {

fprintf(stderr, "cudaMemcpy failed!");

goto Error;

}

Error:

cudaFree(dev_c);

cudaFree(dev_a);

cudaFree(dev_b);

return cudaStatus;

}

Need For explanation, Comments for that.

Got Questions?
Feel free to ask me any question because I'd be happy to walk you through step by step!

Reference and External Links

CUDA C Programming Guide
Programming Massively Parallel Processors By David B. Kirk and Wen-mei W.Hwu

6 comments:

Anonymous27 February 2015 at 22:28
what r these fn doing ?
ReplyDelete
Replies
zzzzz20 August 2016 at 13:51
This is program is very similar to the corresponding c program to add elements of an array
ReplyDelete
Replies
Unknown10 September 2018 at 20:56
Dear Sir:
When I run your code, I got all elements in c array equal to 0. Could you give me a hint how to solve this problem? Thanks in advance.

Jay Chen
ReplyDelete
Replies
mary19 September 2018 at 11:23
This text is masterful.
ReplyDelete
Replies
Chiku Cab7 July 2022 at 03:32
This type is exceptional. These sorts of minuscule realities are utilized a wide assortment of confirmation skills. My accomplice and I favor the hypothesis much.
Volvo bus on rent in Delhi
ReplyDelete
Replies

Add comment

Help us to improve our quality and become contributor to our blog

CUDA Programming

Prefer Your Language

Search This Blog

Tags

CUDA C code for Addition of Two Array elements

6 comments:

Recent Post

About Me

Total Pageviews

Labels

Blog Archive

Labels

Like Us

Cloud

Admin

CUDA Programming

Prefer Your Language

Search This Blog

Tags

Related Posts

Share This

CUDA C code for Addition of Two Array elements

6 comments:

Recent Post

About Me

Subscribe To

Total Pageviews

Labels

Blog Archive

Labels

Like Us

Cloud

Admin