Pages

Monday, 1 April 2013

TEXTURE OBJECT IN CUDA | Bindless Texture in CUDA



I have talked about texture memory in CUDA in this article previously. In the previous article, we were talking about Texture Reference. Now in this article I’m introducing new feature of Kepler architecture called Texture object.

For this, we recap some of the important concept of texture memory that we have discussed in details in this article previously. Feel free to brows this article for more theory and example.   


TEXTURE Memory
Texture memory is read from kernels using the device functions described in Texture Functions. The process of reading a texture calling one of these functions is called a texture fetch. Each texture fetch specifies a parameter called a texture object for the texture object API or a texture reference for the texture reference API.

the TEXTURE
Texture which is the piece of texture memory that is fetched. Texture objects are created at runtime and the texture is specified when creating the texture object as described in Texture Object API. Texture references are created at compile time and the texture is specified at runtime by bounding the texture reference to the texture through runtime functions as described in Texture Reference API; several distinct texture references might be bound to the same texture or to textures that overlap in memory. A texture can be any region of linear memory or a CUDA array (described in CUDA Arrays).
TEXTURE dimensionality
Its dimensionality that specifies whether the texture is addressed as a one dimensional array using one texture coordinate, a two-dimensional array using two texture coordinates, or a three-dimensional array using three texture coordinates. Elements of the array are called texels, short for texture elements. The texture width, height, and depth refer to the size of the array in each dimension. Table Wiki lists the maximum texture width, height, and depth depending on the compute capability of the device.

TEXTURE type (Texel)
The type of a texel, which is restricted to the basic integer and single-precision floating-point types and any of the 1-, 2-, and 4-component vector types defined in char, short, int, long, long long, float, double.

TEXTURE Read Mode
The read mode, which is equal to cudaReadModeNormalizedFloat or cudaReadModeElementType. If it is cudaReadModeNormalizedFloat and the type of the texel is a 16-bit or 8-bit integer type, the value returned by the texture fetch is actually returned as floating-point type and the full range of the integer type is mapped to [0.0, 1.0] for unsigned integer type and [-1.0, 1.0] for signed integer type; for example, an unsigned 8-bit texture element with the value 0xff reads as 1. If it is
cudaReadModeElementType, no conversion is performed.

TEXTURE Normalized coordinates
Whether texture coordinates are normalized or not. By default, textures are referenced (by the functions of Texture Functions) using floating-point coordinates in the range [0, N-1] where N is the size of the texture in the dimension corresponding to the coordinate. For example, a texture that is 64x32 in size will be referenced with coordinates in the range [0, 63] and [0, 31] for the x and y dimensions, respectively. Normalized texture coordinates cause the coordinates to be specified in the range [0.0, 1.0-1/N] instead of [0, N-1], so the same 64x32 texture would be addressed by normalized coordinates in the range [0, 1-1/N] in both the x and y dimensions. Normalized texture coordinates are a natural fit to some applications' requirements, if it is preferable for the texture coordinates to be independent of the texture size.

TEXTURE addressing mode
It is valid to call the device functions with coordinates that are out of range. The addressing mode defines what happens in that case. The default addressing mode is to clamp the coordinates to the valid range: [0, N) for non-normalized coordinates and [0.0, 1.0) for normalized coordinates. If the border mode is specified instead, texture fetches with out-of-range texture coordinates return zero. For normalized coordinates, the warp mode and the mirror mode are also available. When using the wrap mode, each coordinate x is converted to frac(x)= x floor(x) where floor(x) is the largest integer not greater than x. When using the mirror mode, each coordinate x is converted to frac(x) if floor(x) is even and 1-frac(x) if floor(x) is odd. The addressing mode is specified as an array of size three whose first, second, and third elements specify the addressing mode for the first, second, and third texture coordinates, respectively; the addressing mode are cudaAddressModeBorder, cudaAddressModeClamp, cudaAddressModeWrap and cudaAddressModeMirror; cudaAddressModeWrap and
cudaAddressModeMirror are only supported for normalized texture coordinates.

TEXTURE filtering mode
The filtering mode which specifies how the value returned when fetching the texture is computed based on the input texture coordinates. Linear texture filtering may be done only for textures that are configured to return floating-point data. It performs
Low-precision interpolation between neighboring texels. When enabled, the texels surrounding a texture fetch location are read and the return value of the texture fetch is interpolated based on where the texture coordinates fell between the texels. Simple linear interpolation is performed for one-dimensional textures, bilinear interpolation for two-dimensional textures, and trilinear interpolation for three-dimensional textures. Texture Fetching gives more details on texture fetching. The filtering mode is equal to cudaFilterModePoint or cudaFilterModeLinear. If it is cudaFilterModePoint, the returned value is the texel whose texture coordinates are the closest to the input texture coordinates. If it is cudaFilterModeLinear, the returned value is the linear interpolation of the two (for a one-dimensional texture), four (for a two dimensional texture), or eight (for a three dimensional texture) texels whose texture coordinates are the closest to the input texture coordinates. cudaFilterModeLinear is only valid for returned values of floating-point type.

TEXTURE object api
A texture object is created using cudaCreateTextureObject() from a resource description of type struct cudaResourceDesc, which specifies the texture, and from a texture description defined as such:

struct cudaTextureDesc
{
      enum cudaTextureAddressMode addressMode[3];
      enum cudaTextureFilterMode filterMode;
      enum cudaTextureReadMode readMode;
      int sRGB;
      int normalizedCoords;   
      unsigned int maxAnisotropy;
      enum cudaTextureFilterMode mipmapFilterMode;
      float mipmapLevelBias;
     float minMipmapLevelClamp;
     float maxMipmapLevelClamp;
};

·         addressMode specifies the addressing mode;
·         filterMode specifies the filter mode;
·         readMode specifies the read mode;
·         normalizedCoords specifies whether texture coordinates are normalized or not;

TEXTURE REFERENCES
Textures are likely a familiar concept to anyone who’s done much CUDA programming. A feature from the graphics world, textures are images that are stretched, rotated and pasted on polygons to form the 3D graphics we are familiar with. Using textures for GPU computing has always been a pro tip for the CUDA programmer; they enable fast random access to arrays and use a cache to provide bandwidth aggregation. On the flip side, the legacy texture reference API is cumbersome to use because it requires manual binding and unbinding of texture references to memory addresses, as the following code demonstrates. Also, texture references can only be declared as static global variables and cannot be passed as function arguments.

#define N 1024
texture<float, 1, cudaReadModeElementType> tex;

// texture reference name must be known at compile time
__global__ void kernel() {
  int i = blockIdx.x * blockDim.x + threadIdx.x;
  float x = tex1D(tex, i);
  // do some work using x...
}

void call_kernel(float *buffer) {
  // bind texture to buffer
  cudaBindTexture(0, tex, buffer, N*sizeof(float));
  kernel<<<grid, block>>>();

  // unbind texture from buffer
  cudaUnbindTexture(tex);
}

int main() {
  // declare and allocate memory
  float *buffer;
  cudaMalloc(&buffer, N*sizeof(float));
  call_kernel(buffer);
  cudaFree(buffer);
}

TEXTURE OBJECTS
Kepler GPUs and CUDA 5.0 introduce a new feature called texture objects (sometimes called bindless textures, since they don’t require manual binding/unbinding) that greatly improves the usability and programmability of textures. Texture objects use the new cudaTextureObject_t class API, whereby textures become first-class C++ objects and can be passed as arguments just as if they were pointers. There is no need to know at compile time which textures will be used at run time, which enables much more dynamic execution and flexible programming, as shown in the following code.

#define N 1024

// texture object is a kernel argument
__global__ void kernel(cudaTextureObject_t tex) {
  int i = blockIdx.x * blockDim.x + threadIdx.x;
  float x = tex1Dfetch(tex, i);
  // do some work using x ...
}

void call_kernel(cudaTextureObject_t tex) {
  kernel <<<grid, block>>>(tex);
}

int main() {
  // declare and allocate memory
  float *buffer;
  cudaMalloc(&buffer, N*sizeof(float));

  // create texture object
  cudaResourceDesc resDesc;
  memset(&resDesc, 0, sizeof(resDesc));
  resDesc.resType = cudaResourceTypeLinear;
  resDesc.res.linear.devPtr = buffer;
  resDesc.res.linear.desc.f = cudaChannelFormatKindFloat;
  resDesc.desc.x = 32; // bits per channel
  resDesc.res.linear.sizeInBytes = N*sizeof(float);

  cudaTextureDesc texDesc;
  memset(&texDesc, 0, sizeof(texDesc));
  texDesc.readMode = cudaReadModeElementType;

  // create texture object: we only have to do this once!
  cudaTextureObject_t tex;
  cudaCreateTextureObject(&tex, &resDesc, &texDesc, NULL);

  call_kernel(tex); // pass texture as argument

  // destroy texture object
  cudaDestroyTextureObject(tex);

  cudaFree(buffer);
}
Moreover, texture objects only need to be instantiated once, and are not subject to the hardware limit of 128 texture references, so there is no need to continuously bind and unbind them. (The Kepler hardware limit is over one million texture objects, a limit that will likely never be reached by any practical CUDA application in the lifetime of the architecture!) Using texture objects, the overhead of binding (up to 1 μs) and unbinding (up to 0.5 μs) textures is eliminated. What is not commonly known is that each outstanding texture reference that is bound when a kernel is launched incurs added launch latency—up to 0.5 μs per texture reference. This launch overhead persists even if the outstanding bound textures are not even referenced by the kernel. Again, using texture objects instead of texture references completely removes this overhead.


STRONG SCALING IN HPC
While these overheads may sound minor for any application that uses kernels that run for milliseconds or greater in duration, they are important for latency-sensitive, short-running kernels. High Performance Computing (HPC) applications that need strong scaling typify this use case. Strong scaling is how the solution time varies with the number of processors for a fixed global problem size. The goal is to solve a fixed problem in as short a time as possible. There is usually a limit to strong scaling, as at some point an application will become bound by the inter-processor communication bandwidth or latencies that arise from doing less work on each processor. Beyond this limit the total performance will actually decrease.

Figure 1: Strong-scaling performance of the QUDA solver using both texture references and texture objects.
There are many large-scale HPC applications that will be running on the Titan supercomputer at Oak Ridge National Laboratory in Tennessee, for which achieving excellent strong scaling performance will be vital. Lattice Quantum Chromodynamics (LQCD) is one such application. LQCD uses a combination of sparse linear solvers, molecular dynamics algorithms and Monte Carlo methods to probe the structure of the nucleus. Using the highly optimized QUDA library, legacy LQCD applications MILC and Chroma can run on Titan making full use of the attached Tesla K20X accelerators. The QUDA library makes extensive use of textures, and using texture references the strong scaling tops out at 2 TFLOP/s, as you can see in Figure 1. Profiling using the CUDA visual profiler revealed that the overhead of texture binding and unbinding contributed significantly to time-critical communication routines running on the GPU. Moreover, many of the kernels run for O(10μs) using up to ten textures simultaneously. Thus the launch latency overhead is a significant bottleneck. Once the library was rewritten to use texture objects, strong scaling performance increased significantly and throughput increased to nearly 3 TFLOP/s at 27 GPUs (see Figure 1).
Texture objects are yet another powerful feature of the Kepler architecture that make it easier to achieve high performance and scalability with CUDA.
Another example
// Simple transformation kernel
__global__ void transformKernel(float* output, cudaTextureObject_t texObj, int width, int height, float theta)
{
// Calculate normalized texture coordinates
unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
float u = x / (float)width;
float v = y / (float)height;
// Transform coordinates
u -= 0.5f;
v -= 0.5f;
float tu = u * cosf(theta) – v * sinf(theta) + 0.5f;
float tv = v * cosf(theta) + u * sinf(theta) + 0.5f;
// Read from texture and write to global memory
output[y * width + x] = tex2D<float>(texObj, tu, tv);
}
// Host code
int main()
{
// Allocate CUDA array in device memory
cudaChannelFormatDesc channelDesc =cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
cudaArray* cuArray;
cudaMallocArray(&cuArray, &channelDesc, width, height);
// Copy to device memory some data located at address h_data
// in host memory
cudaMemcpyToArray(cuArray, 0, 0, h_data, size,cudaMemcpyHostToDevice);

// Specify texture
struct cudaResourceDesc resDesc;
memset(&resDesc, 0, sizeof(resDesc));
resDesc.resType = cudaResourceTypeArray;
resDesc.res.array.array = cuArray;

// Specify texture object parameters
struct cudaTextureDesc texDesc;
memset(&texDesc, 0, sizeof(texDesc));
texDesc.addressMode[0] = cudaAddressModeWrap;
texDesc.addressMode[1] = cudaAddressModeWrap;
texDesc.filterMode = cudaFilterModeLinear;
texDesc.readMode = cudaReadModeElementType;
texDesc.normalizedCoords = 1;

// Create texture object
cudaTextureObject_t texObj = 0;
cudaCreateTextureObject(&texObj, &resDesc, &texDesc, NULL);

// Allocate result of transformation in device memory
float* output;
cudaMalloc(&output, width * height * sizeof(float));

// Invoke kernel
dim3 dimGrid((width + dimBlock.x – 1) / dimBlock.x, (height + dimBlock.y – 1) / dimBlock.y);
transformKernel<<<dimGrid, dimBlock>>>(output, texObj, width, height, angle);

// Destroy texture object
cudaDestroyTextureObject(texObject);
// Free device memory
cudaFreeArray(cuArray);
cudaFree(output);
return 0;
}

Reference
CUDA C programming Guide


37 comments:

  1. the blog is very interesting and will be much useful for us. thank you for sharing the blog with us. please keep on updating.
    SEO Company In Chennai

    ReplyDelete
  2. Great website! I wonder if people at Nvidia know you copy their blog posts and publish on your page...

    ReplyDelete
  3. I just see the post i am so happy to the communication science post of information's.So I have really enjoyed and reading your blogs for these posts.Any way I’ll be replay for your great thinks and I hope you post again soon.

    seo training in chennai

    ReplyDelete
  4. Thanks, very useful descriptions!

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. I foubd nice articlein your blog.thank you for useful info
    .

    ReplyDelete
  7. Great reading these your posts.

    ReplyDelete
  8. Thank you for an additional great post. Exactly where else could anybody get that kind of facts in this kind of a ideal way of writing? I have a presentation next week, and I’m around the appear for this kind of data.
    Java training in Bangalore | Java training in Electronic city

    Java training in Chennai | Java training institute in Chennai | Java course in Chennai

    Java training in USA

    Java training in Bangalore | Java training in Indira nagar

    ReplyDelete
  9. It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command
    python training institute in marathahalli
    python training institute in btm
    Python training course in Chennai

    ReplyDelete
  10. " I’d love to be a part of group where I can get advice from other experienced people that share the same interest. If you have any recommendations, please let me know. Thank you.
    "
    apple service center chennai | ipod service center in chennai | Apple laptop service center in chennai | apple iphone service center in chennai | apple iphone service center in chennai

    ReplyDelete
  11. There are a lot of occasions to be in contact with the job world (workshops, presentations in front of the company, guest speakers, seminars).Along that blogs is helping through learn on different path to move on different career.
    Laptop battery replacement in chennai
    Laptop unlocking service in chennai
    100% genuine laptop parts
    Laptop service centre in chennai
    Laptop display replacement in chennai

    ReplyDelete
  12. I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing.

    Became an Expert In Google Cloud Platform Training in Bangalore! Learn from experienced Trainers and get the knowledge to crack a coding interview, @Softgen Infotech Located in BTM Layout.

    ReplyDelete
  13. Дамская сумочка – поистине предмет искусства. Итальянцы постоянно экперементируют с цветом и материалом, ваяя невероятные сумочки из разных веществ. Однако наряду с такими будоражащими воображение остаются неизменно привычнми традицонные сумочки из кожи, красивые и подкупающие непосредственной элегантностью и высоким стилем. Красивая природная кожа, идеальная фурнитура, многолетние традиции и акцент последних стилей итальянской моды мотивируют проивзодителей саломея сумки на создание всё более идеальных коллекций. Новых трендом стала треугольная кожаная сумочка. Украшенная стильными кругами из медных деталей - будет блистательным акцентом твоего наряда. Выбирайте коричневую, для того, чтобы бысть стильной и заслужить восторженные взоры.

    ReplyDelete
  14. Nice reading, I love your content. This is really a fantastic and informative post. Keep it up and if you are looking for devops course then visit igmguru.com.
    data science with python training

    ReplyDelete
  15. Nice to be visiting your blog again, it has been months for me. Well this article that i’ve been waited for so long. I need this article to complete my assignment in the college, and it has same topic with your article. Thanks, great share.


    Gesper666
    Resep Ntaps

    ReplyDelete
  16. Texture memory is read from kernels using the device functions described in Texture Functions

    gabung file jpg

    ReplyDelete
  17. I got such a useful stuff on your website that helps me a lot to gain information.

    check university ba 1st year time table

    ReplyDelete
  18. This comment has been removed by the author.

    ReplyDelete
  19. This is a great article with lots of informative resources. I appreciate your work this is really helpful for everyone. Check out our website DevOps Training for more igmguru.com, related info!

    ReplyDelete
  20. React.js offers a faster development environment, and its very easy to use. Learn them without taking care of too many setups, when you are developing apps by serving HTML files.
    Learn How to Use Reactjs Cdn

    ReplyDelete
  21. This comment has been removed by the author.

    ReplyDelete
  22. Ask him any Questions about herpes

    […] It’s a crazy miracle product. Read my review here. […]

    I was cured of herpes simplex virus his contact ___________Robinsonbuckler11@gmail. com.......................

    this is the best herbal remedy…

    ReplyDelete
  23. I am really really impressed with your writing skills as well as with the layout on your blog.

    baby night pannel chart, golden day express pannel chart, meenakshi morning pannel chart.

    ReplyDelete
  24. registered as an official certified guarantee company for over 150 community sites, and is a playground that provides a safe betting environment as a premium Toto site that has been in use without changing its company name for over 6 years.

    ReplyDelete
  25. Toto without restrictions and unlimited winnings and cash out, just click on the Superstar Bet registration code and address provided.

    ReplyDelete
  26. preparing exciting promotions that cannot be experienced in reality and provides the best games in the industry to provide true entertainment and opportunities for winning.

    ReplyDelete
  27. Playground boasts the largest domestic capital and is the best real-time mini-game specialized site that provides the highest odds in the industry, 1.96, and provides major site codes.

    ReplyDelete

Help us to improve our quality and become contributor to our blog