How to avoid uses of cudaMalloc () in intermediate calculation | How to use cudaGetSymbolAddress () in CUDA | Using cudaGetSymbolAddress () in CUDA

Posted by Unknown at 03:04 | 13 comments

Hello readers, today we’ll see a very interesting optimization technique that I have used in my online competition in CUDA programming. Surprisingly my time get reduce by 2 ms. Although you must be thinking it is not a big optimization that you should do, but let me tell you, this is, if you’re computing something on GPU for which you need lots of intermediate memory allocation that has been allocating by cudaMalloc then surely cudaGetSymbolAddress () function will optimize your code.

Well, it depends on application. But if possible anywhere, we can use this function for optimization.
In this tutorial we’ll follow, following steps:

Step 1: Gather information about cudaGetSymbolAddress ()

Step 2: Example of cudaGetSymbolAddress ()

Step 3: Where to use and where not to use cudaGetSymbolAddress ()

Step 4: How it optimizes your code?

Ok, so we start our discussion on cudaGetSymbolAddress ()

cudaGetSymbolAddress () in CUDA

In many case, probably in computing any function on GPU for which we need lots of intermediate memory allocation. cudaGetSymbolAddress () is the way to ride out form this.
Frankly, this function does not allocate memory for you but it gives you handle for that memory allocation. Wonder!!! Right?

Ok, start with its prototype;

cudaError_t cudaGetSymbolAddress (void **devptr, const char * symbol) ;

This function returns *devptr the address of symbol ‘symbol’ on the device. Symbol can be either be a variable that resides in global or constant memory space; of it can be a character string, naming a variable that resides in global or constant memory space.

For more: refer this link.

This is done in host code. The following examples will let you know how to use this function.

Example of cudaGetSymbolAddress ()

A simple example

Example1.cu

#include"cuda.h"

__device__ double param = 0.5;

int main()

{

double* h_example;

h_example = (double*)malloc(10*sizeof(double));

for(int i = 0; i < 10; i++)

{

h_example[i] = i;

}

double* d_example;

cudaMalloc(&d_example, 10*sizeof(double));

cudaMemcpy(d_example, h_example, 10*sizeof(double), cudaMemcpyHostToDevice);

double* paramptr;

cudaGetSymbolAddress(&paramptr, (const char *)"param");

return 0;

}

In the above example, the main lines of code are highlighted by red color. Since according to definition of function the variable should resides in either global or constant memory, so, in our case it resides in global memory as;

__device__ double param = 0.5;

This is static allocation of variable “param”, for getting its device pointer we use following line;

cudaGetSymbolAddress(&paramptr, (const char *)"param");

This line can also be written as;

cudaGetSymbolAddress(&paramptr, param); //simple one

Now you can refer this variable in your device/kernel code using “paramptr”.

Well, Example1.cu is not very informative about cudaGetSymbolAddress, Example2.cu will let you know in terms of Array’s .

Example2.cu

#include"cuda.h"

__device__ double param [10]; //Line 1

int main()

{

double* h_example;

h_example = (double*)malloc(10*sizeof(double));

for(int i = 0; i < 10; i++)

{

h_example[i] = i;

}

double* d_example;

cudaMalloc(&d_example, 10*sizeof(double));

cudaMemcpy(d_example, h_example, 10*sizeof(double), cudaMemcpyHostToDevice);

double* paramptr;

cudaGetSymbolAddress((void **)&paramptr, param); //Line 2

return 0;

}

So specifically there is no other change in code other than this line;

__device__ double param [10];

Now we are allocating memory statically of 10*sizeof(double) bytes, using this line 2, we get the pointer to this memory.

Now we can use this pointer in kernel call. Which demonstrate by Example 3.cu

#include"cuda.h"

__device__ double param [10]; //Line 1

__global__ void

Kernel (double *ptr)

{

//do some task with ptr

}

int main()

{

double* h_example;

h_example = (double*)malloc(10*sizeof(double));

for(int i = 0; i < 10; i++)

{

h_example[i] = i;

}

double* d_example;

cudaMalloc(&d_example, 10*sizeof(double));

cudaMemcpy(d_example, h_example, 10*sizeof(double), cudaMemcpyHostToDevice);

double* paramptr;

cudaGetSymbolAddress((void **)&paramptr, param); //Line 2

//call kernel

int numberofthreads = 128 ;

int numberofblocks = 20 ;

Kernel<<<numberofblocks, numberofthreads >>> (paramptr) ;

return 0;

}

You can also copy data from host to device array as you does with cudaMemcpy as;

cudaMemcpy(paramptr, h_example, 10*sizeof(double), cudaMemcpyHostToDevice);

Where to use and where not to use cudaGetSymbolAddress ()

It is necessary to know where to use this function and where not to use. There is no hard and fast rule but my recommendation is, use this function only for intermediate operation.
For example, if your application want’s some reduction on your input data (let say Sum reduction), then you need to reduce first all blocks data and store this intermediate result in intermediate array then again apply reduction on this intermediate array in order to get final value. So allocating this intermediate array using cudaMalloc may cost up to 1ms, but if you use cudaGetSymbolAddress by static allocation, you can save up to 0.9ms+, it means it takes less than 0.01ms time, probably in microseconds.

How it optimizes your code?

Well answer of this question is straightforward. If you allocate memory using cudaMalloc then actually you are allocating memory dynamically at run time, but if you can estimate how much memory you’ll need for the particular array, you can allocate it statically which is fast as compare to dynamic allocation.

I hope you must like this article.

Got Questions?

Feel free to ask me any question because I'd be happy to walk you through step by step!

Want to Contact us? Click here

13 comments:

Unknown10 August 2014 at 19:53
Good, it's useful to me.
ReplyDelete
Replies
Unknown25 August 2014 at 06:30
Thank you. It really helps.
ReplyDelete
Replies
바카라사이트25 September 2024 at 20:16
I hope that you continue to do this blog work, You're good at this one buddy!
ReplyDelete
Replies
메이저사이트 안전놀이터25 September 2024 at 20:20
A fantastic blog and i’ll come back again for more useful content. Thanks
ReplyDelete
Replies
사설 토토사이트25 September 2024 at 20:24
Im very pleased to read this article. Awesome post, Thanks for this one
ReplyDelete
Replies
파워볼사이트 추천25 September 2024 at 20:27
This is a great article, Wish you would write more. good luck for more!
ReplyDelete
Replies
바카라 게임25 April 2025 at 00:30
Really a great post. The topic you discuss is so inspiring and motivating. Thank you for the share..
ReplyDelete
Replies
카지노사이트 모음25 April 2025 at 00:32
Cool stuff you have got and you keep update all of us.
ReplyDelete
Replies
코윈벳 도메인 주소25 April 2025 at 00:34
Just about each amazing really own & each methods to your website are extraordinary
ReplyDelete
Replies
코윈벳 도메인 주소25 April 2025 at 00:36
I am mainly passionate about your outstanding achieve
ReplyDelete
Replies
카지노사이트 모음11 July 2025 at 04:05
Excellent items from you, man. I have take into account your stuff previous to and you're just too magnificent.
ReplyDelete
Replies
토토사이트11 July 2025 at 04:05
Some truly quality articles on this website , bookmarked .
ReplyDelete
Replies
안전놀이터11 July 2025 at 04:06
I love your blog.. very nice colors & theme. Keep working ,splendid job!
ReplyDelete
Replies

Add comment

Help us to improve our quality and become contributor to our blog

CUDA Programming

Prefer Your Language

Search This Blog

Tags

How to avoid uses of cudaMalloc () in intermediate calculation | How to use cudaGetSymbolAddress () in CUDA | Using cudaGetSymbolAddress () in CUDA

13 comments:

Recent Post

About Me

Total Pageviews

Labels

Blog Archive

Labels

Like Us

Cloud

Admin

CUDA Programming

Prefer Your Language

Search This Blog

Tags

Related Posts

Share This

How to avoid uses of cudaMalloc () in intermediate calculation | How to use cudaGetSymbolAddress () in CUDA | Using cudaGetSymbolAddress () in CUDA

13 comments:

Recent Post

About Me

Subscribe To

Total Pageviews

Labels

Blog Archive

Labels

Like Us

Cloud

Admin