Prefer Your Language

Search This Blog

How to Pass Parameters in CUDA Kernel?



CUDA C is quite similar to C language so, passing parameters to CUDA kernel are same as in C.

Read article further for Example….

We’ve promised the ability to pass parameters to our kernel, and the time has come for us to make good on that promise. Consider the following enhancement to our “Hello, World!” application:

 
__global__ void add( int a, int b, int *c )
 {
                *c = a + b;
}
int main( void )
{
               int c;
               int *dev_c;
             HANDLE_ERROR( cudaMalloc( (void**)&dev_c, sizeof(int) ) );
              add<<<1,1>>>( 2, 7, dev_c );
             HANDLE_ERROR( cudaMemcpy( &c,
              dev_c,
             sizeof(int),
              cudaMemcpyDeviceToHost ) );
              printf( "2 + 7 = %d\n", c );
              cudaFree( dev_c );
            return 0;
}


You will notice a handful of new lines here, but these changes introduce only two concepts:
  • We can pass parameters to a kernel as we would with to any C function.
  •  We need to allocate memory to do anything useful on a device, such as return values to the host.
There is nothing special about passing parameters to a kernel. The angle-bracket syntax notwithstanding, a kernel call looks and acts exactly like any function call in standard C. The runtime system takes care of any complexity introduced by the fact that these parameters need to get from the host to the device.
Nots on CODE
The more interesting addition is the allocation of memory using cudaMalloc(). This call behaves very similarly to the standard C call malloc(), but it tells the CUDA runtime to allocate the memory on the  device. The first argument is a pointer to the pointer you want to hold the address of the newly allocated memory, and the second parameter is the size of the allocation you want to make. Besides that your allocated memory pointer is not the function’s return value, this is identical behavior to malloc(), right down to the void* return type. The HANDLE_ERROR() that surrounds these calls is a utility macro that we have provided as part of this book’s support code. It simply detects that the call has returned an error, prints the associated error message, and exits the application with an EXIT_FAILURE code. Although you are free to use this code in your own applications, it is highly likely that this error-handling code will be insufficient in production code.
This raises a subtle but important point. Much of the simplicity and power of CUDA C derives from the ability to blur the line between host and device code. However, it is the responsibility of the programmer not to dereference the pointer returned by cudaMalloc() from code that executes on the host. Host code may pass this pointer around, perform arithmetic on it, or even cast it to a different type. But you cannot use it to read or write from memory. Unfortunately, the compiler cannot protect you from this mistake, either. It will be perfectly happy to allow dereferences of device pointers in your host code because it looks like any other pointer in the application. We can summarize the restrictions on the usage of device pointer as follows:
  • You can pass pointers allocated with cudaMalloc() to functions that execute on the device.
  • You can use pointers allocated with cudaMalloc()to read or write memory from code that executes on the device.
  • You can pass pointers allocated with cudaMalloc()to functions that execute on the host.
  • You cannot use pointers allocated with cudaMalloc()to read or write memory from code that executes on the host.
If you’ve been reading carefully, you might have anticipated the next lesson: We can’t use standard C’s free() function to release memory we’ve allocated with udaMalloc(). To free memory we’ve allocated with cudaMalloc(), we need to use a call to cudaFree(), which behaves exactly like free() does.


Got Questions?
Feel free to ask me any question because I'd be happy to walk you through step by step!  


Reference and External Links

3 comments:

  1. Hi, I have a big chunk of multidimensional array that need to be passed to __device__ function. Is it possible to pass this multidimensional array directly to __device__ kernel rather than passing through first __global__ and then __kernel.

    ReplyDelete
  2. These messages should be more often read by people.
    London Escorts Agency

    ReplyDelete

Help us to improve our quality and become contributor to our blog

Become a contributor to this blog. Click on contact us tab
Blogger Template by Clairvo