I have
talked about texture memory in CUDA in this
article previously. In the previous
article, we were talking about Texture Reference. Now in this article I’m
introducing new feature of Kepler architecture called Texture object.
For this, we
recap some of the important concept of texture memory that we have discussed in
details in this
article previously. Feel free to brows this
article for more theory and example.
TEXTURE Memory
Texture memory
is read from kernels using the device functions described in Texture
Functions. The process of reading a texture calling one of these functions
is called a texture fetch.
Each texture fetch specifies a parameter called a texture object for
the texture object API or a texture reference for
the texture reference API.
the
TEXTURE
Texture which is the
piece of texture memory that is fetched. Texture
objects are created at runtime and the texture is specified when creating
the texture object as described in Texture Object API.
Texture references are created at compile time and the texture is specified at
runtime by bounding the texture reference to the texture through runtime
functions as described in Texture
Reference API; several distinct texture references might
be bound to the same texture or to textures that overlap in memory. A texture
can be any region of linear memory or a CUDA array (described in CUDA
Arrays).
TEXTURE
dimensionality
Its dimensionality
that
specifies whether the texture is addressed as a one dimensional array using one
texture coordinate, a two-dimensional array using two texture coordinates, or a
three-dimensional array using three texture coordinates. Elements of the array
are called texels, short
for texture
elements.
The texture
width,
height, and depth refer to
the size of the array in each dimension. Table Wiki lists the maximum texture
width, height, and depth depending on the compute capability of the device.
TEXTURE
type (Texel)
The type of a
texel, which is restricted to the basic integer and single-precision floating-point
types and any of the 1-, 2-, and 4-component vector types defined in char,
short, int, long, long long, float, double.
TEXTURE
Read Mode
The read mode, which is
equal to cudaReadModeNormalizedFloat or cudaReadModeElementType. If it is
cudaReadModeNormalizedFloat and the type
of the texel is a 16-bit or 8-bit
integer type, the value returned by the texture fetch is actually returned
as floating-point type and the full range of the integer type is mapped to
[0.0, 1.0] for unsigned integer type and [-1.0, 1.0] for signed integer type; for
example, an unsigned 8-bit texture element with the value 0xff reads as 1. If
it is
cudaReadModeElementType, no
conversion is performed.
TEXTURE
Normalized coordinates
Whether texture
coordinates are normalized or not. By default, textures are referenced (by the
functions of Texture Functions) using floating-point
coordinates in the range [0, N-1] where N is the size of the texture in the
dimension corresponding to the coordinate. For example, a texture that is 64x32
in size will be referenced with coordinates in the range [0, 63] and [0, 31]
for the x and y dimensions, respectively. Normalized texture coordinates cause
the coordinates to be specified in the range [0.0, 1.0-1/N] instead of [0,
N-1], so the same 64x32 texture would be addressed by normalized coordinates in
the range [0, 1-1/N] in both the x and y dimensions. Normalized texture
coordinates are a natural fit to some applications' requirements, if it is
preferable for the texture coordinates to be independent of the texture size.
TEXTURE
addressing mode
It is valid to call the
device functions with coordinates that are out of range. The addressing mode
defines what happens in that case. The
default addressing mode is to clamp the coordinates to the valid range: [0,
N) for non-normalized coordinates and [0.0, 1.0) for normalized coordinates. If
the border mode is specified instead, texture fetches with out-of-range texture
coordinates return zero. For normalized coordinates, the warp mode
and the mirror mode are also available. When using the wrap mode, each coordinate x is converted to frac(x)= x floor(x) where floor(x) is the largest integer not greater than x. When using
the mirror mode, each coordinate x is
converted to frac(x)
if
floor(x)
is
even and 1-frac(x)
if
floor(x)
is
odd. The addressing mode is specified as an array of size three whose first,
second, and third elements specify the addressing mode for the first, second,
and third texture coordinates, respectively; the addressing mode are cudaAddressModeBorder, cudaAddressModeClamp, cudaAddressModeWrap
and
cudaAddressModeMirror; cudaAddressModeWrap and
cudaAddressModeMirror are only
supported for normalized texture coordinates.
TEXTURE
filtering mode
The filtering mode which specifies how the value returned
when fetching the texture is computed based on the input texture coordinates.
Linear texture filtering may be done only for textures that are configured to
return floating-point data. It performs
Low-precision
interpolation between neighboring texels. When enabled, the texels surrounding
a texture fetch location are read and the return value of the texture fetch is
interpolated based on where the texture coordinates fell between the texels.
Simple linear interpolation is performed for one-dimensional textures, bilinear
interpolation for two-dimensional textures, and trilinear interpolation for
three-dimensional textures. Texture Fetching gives
more details on texture fetching. The filtering mode is equal to cudaFilterModePoint or
cudaFilterModeLinear.
If it is cudaFilterModePoint,
the returned value is the texel whose texture coordinates are the closest to
the input texture coordinates. If it is cudaFilterModeLinear,
the returned value is the linear interpolation of the two (for a
one-dimensional texture), four (for a two dimensional texture), or eight (for a
three dimensional texture) texels whose texture coordinates are the closest to
the input texture coordinates. cudaFilterModeLinear is
only valid for returned values of floating-point type.
TEXTURE
object api
A texture object is created
using cudaCreateTextureObject() from a
resource description of type struct cudaResourceDesc, which specifies the texture, and
from a texture description defined as such:
struct
cudaTextureDesc
{
enum cudaTextureAddressMode
addressMode[3];
enum cudaTextureFilterMode
filterMode;
enum cudaTextureReadMode
readMode;
int sRGB;
int normalizedCoords;
unsigned int maxAnisotropy;
enum cudaTextureFilterMode
mipmapFilterMode;
float mipmapLevelBias;
float minMipmapLevelClamp;
float maxMipmapLevelClamp;
};
|
·
addressMode specifies the addressing
mode;
·
filterMode specifies the filter
mode;
·
readMode specifies the read
mode;
·
normalizedCoords specifies
whether texture coordinates are normalized or not;
TEXTURE
REFERENCES
Textures
are likely a familiar concept to anyone who’s done much CUDA programming. A
feature from the graphics world, textures are images that are stretched,
rotated and pasted on polygons to form the 3D graphics we are familiar with.
Using textures for GPU computing has always been a pro tip for the CUDA
programmer; they enable fast random access to arrays and use a cache to provide
bandwidth aggregation. On the flip side, the legacy texture reference API is
cumbersome to use because it requires manual binding and unbinding of texture
references to memory addresses, as the following code demonstrates. Also,
texture references can only be declared as static global variables and cannot
be passed as function arguments.
#define N 1024
texture<float, 1, cudaReadModeElementType> tex;
// texture reference name must be known at compile time
__global__ void kernel() {
int i = blockIdx.x * blockDim.x + threadIdx.x;
float x = tex1D(tex, i);
// do some work using x...
}
void call_kernel(float *buffer) {
// bind texture to buffer
cudaBindTexture(0, tex, buffer, N*sizeof(float));
kernel<<<grid, block>>>();
// unbind texture from buffer
cudaUnbindTexture(tex);
}
int main() {
// declare and allocate memory
float *buffer;
cudaMalloc(&buffer, N*sizeof(float));
call_kernel(buffer);
cudaFree(buffer);
}
TEXTURE
OBJECTS
Kepler GPUs and CUDA 5.0 introduce a new feature called texture
objects (sometimes called bindless textures,
since they don’t require manual binding/unbinding) that greatly improves the
usability and programmability of textures. Texture objects use the new
cudaTextureObject_t class API, whereby textures become first-class C++ objects
and can be passed as arguments just as if they were pointers. There is no need
to know at compile time which textures will be used at run time, which enables
much more dynamic execution and flexible programming, as shown in the following
code.
#define N 1024
// texture object is a kernel argument
__global__ void kernel(cudaTextureObject_t tex) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
float x = tex1Dfetch(tex, i);
// do some work using x ...
}
void call_kernel(cudaTextureObject_t tex) {
kernel <<<grid, block>>>(tex);
}
int main() {
// declare and allocate memory
float *buffer;
cudaMalloc(&buffer, N*sizeof(float));
// create texture object
cudaResourceDesc resDesc;
memset(&resDesc, 0, sizeof(resDesc));
resDesc.resType = cudaResourceTypeLinear;
resDesc.res.linear.devPtr = buffer;
resDesc.res.linear.desc.f = cudaChannelFormatKindFloat;
resDesc.desc.x = 32; // bits per channel
resDesc.res.linear.sizeInBytes = N*sizeof(float);
cudaTextureDesc texDesc;
memset(&texDesc, 0, sizeof(texDesc));
texDesc.readMode = cudaReadModeElementType;
// create texture object: we only have to do this once!
cudaTextureObject_t tex;
cudaCreateTextureObject(&tex, &resDesc, &texDesc, NULL);
call_kernel(tex); // pass texture as argument
// destroy texture object
cudaDestroyTextureObject(tex);
cudaFree(buffer);
}
Moreover, texture objects only need to be instantiated once, and
are not subject to the hardware limit of 128 texture references, so there is no
need to continuously bind and unbind them. (The Kepler hardware limit is over
one million texture objects, a limit that will likely never be reached by any
practical CUDA application in the lifetime of the architecture!) Using texture
objects, the overhead of binding (up to 1 μs) and unbinding (up to 0.5 μs)
textures is eliminated. What is not commonly known is that each outstanding
texture reference that is bound when a kernel is launched incurs added launch
latency—up to 0.5 μs per texture reference. This launch overhead persists even
if the outstanding bound textures are not even referenced by the kernel. Again,
using texture objects instead of texture references completely removes this
overhead.
STRONG
SCALING IN HPC
While these overheads may sound minor for any application that
uses kernels that run for milliseconds or greater in duration, they are
important for latency-sensitive, short-running kernels. High Performance
Computing (HPC) applications that need strong scaling typify this use case.
Strong scaling is how the solution time varies with the number of processors
for a fixed global problem size. The goal is to solve a fixed problem in as
short a time as possible. There is usually a limit to strong scaling, as at
some point an application will become bound by the inter-processor
communication bandwidth or latencies that arise from doing less work on each processor.
Beyond this limit the total performance will actually decrease.
Figure
1: Strong-scaling performance of the QUDA solver using both texture references
and texture objects.
There are many large-scale HPC applications that will be running
on the Titan supercomputer at Oak Ridge National Laboratory in Tennessee, for
which achieving excellent strong scaling performance will be vital. Lattice
Quantum Chromodynamics (LQCD) is one such application. LQCD uses a combination
of sparse linear solvers, molecular dynamics algorithms and Monte Carlo methods
to probe the structure of the nucleus. Using the highly optimized QUDA library,
legacy LQCD applications MILC and Chroma can run on Titan making full use of
the attached Tesla K20X accelerators. The QUDA library makes extensive use of
textures, and using texture references the strong scaling tops out at 2
TFLOP/s, as you can see in Figure 1. Profiling using the CUDA visual profiler
revealed that the overhead of texture binding and unbinding contributed
significantly to time-critical communication routines running on the GPU.
Moreover, many of the kernels run for O(10μs) using up to ten textures
simultaneously. Thus the launch latency overhead is a significant bottleneck.
Once the library was rewritten to use texture objects, strong scaling
performance increased significantly and throughput increased to nearly 3
TFLOP/s at 27 GPUs (see Figure 1).
Texture objects are yet another powerful feature of the Kepler
architecture that make it easier to achieve high performance and scalability
with CUDA.
Another example
//
Simple transformation kernel
__global__
void transformKernel(float*
output, cudaTextureObject_t texObj, int width,
int height, float theta)
{
//
Calculate normalized texture coordinates
unsigned
int x = blockIdx.x * blockDim.x +
threadIdx.x;
unsigned
int y = blockIdx.y * blockDim.y +
threadIdx.y;
float
u = x / (float)width;
float
v = y / (float)height;
//
Transform coordinates
u -=
0.5f;
v -=
0.5f;
float
tu = u * cosf(theta) – v * sinf(theta) + 0.5f;
float
tv = v * cosf(theta) + u * sinf(theta) + 0.5f;
//
Read from texture and write to global memory
output[y
* width + x] = tex2D<float>(texObj, tu, tv);
}
//
Host code
int main()
{
//
Allocate CUDA array in device memory
cudaChannelFormatDesc
channelDesc =cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
cudaArray*
cuArray;
cudaMallocArray(&cuArray,
&channelDesc, width, height);
//
Copy to device memory some data located at address h_data
//
in host memory
cudaMemcpyToArray(cuArray,
0, 0, h_data, size,cudaMemcpyHostToDevice);
//
Specify texture
struct
cudaResourceDesc resDesc;
memset(&resDesc,
0, sizeof(resDesc));
resDesc.resType
= cudaResourceTypeArray;
resDesc.res.array.array
= cuArray;
//
Specify texture object parameters
struct
cudaTextureDesc texDesc;
memset(&texDesc,
0, sizeof(texDesc));
texDesc.addressMode[0]
= cudaAddressModeWrap;
texDesc.addressMode[1]
= cudaAddressModeWrap;
texDesc.filterMode
= cudaFilterModeLinear;
texDesc.readMode
= cudaReadModeElementType;
texDesc.normalizedCoords
= 1;
//
Create texture object
cudaTextureObject_t
texObj = 0;
cudaCreateTextureObject(&texObj,
&resDesc, &texDesc, NULL);
//
Allocate result of transformation in device memory
float*
output;
cudaMalloc(&output,
width * height * sizeof(float));
//
Invoke kernel
dim3
dimGrid((width + dimBlock.x – 1) / dimBlock.x, (height +
dimBlock.y – 1) / dimBlock.y);
transformKernel<<<dimGrid,
dimBlock>>>(output, texObj, width, height, angle);
//
Destroy texture object
cudaDestroyTextureObject(texObject);
//
Free device memory
cudaFreeArray(cuArray);
cudaFree(output);
return
0;
}
|
Reference
CUDA C programming Guide
the blog is very interesting and will be much useful for us. thank you for sharing the blog with us. please keep on updating.
ReplyDeleteSEO Company In Chennai
Image Processing Projects For Final Year Students
Deletepython projects for final year students
you'll find c codes on c codes for beginners
ReplyDeleteGreat website! I wonder if people at Nvidia know you copy their blog posts and publish on your page...
ReplyDeleteI just see the post i am so happy to the communication science post of information's.So I have really enjoyed and reading your blogs for these posts.Any way I’ll be replay for your great thinks and I hope you post again soon.
ReplyDeleteseo training in chennai
Thanks, very useful descriptions!
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteI foubd nice articlein your blog.thank you for useful info
ReplyDelete.
Great reading these your posts.
ReplyDeleteThank you for an additional great post. Exactly where else could anybody get that kind of facts in this kind of a ideal way of writing? I have a presentation next week, and I’m around the appear for this kind of data.
ReplyDeleteJava training in Bangalore | Java training in Electronic city
Java training in Chennai | Java training institute in Chennai | Java course in Chennai
Java training in USA
Java training in Bangalore | Java training in Indira nagar
Really great post, Thank you for sharing This knowledge.Excellently written article, if only all bloggers offered the same level of content as you, the internet would be a much better place. Please keep it up!
ReplyDeleteData Science course in Chennai | Best Data Science course in Chennai
Data science course in bangalore | Best Data Science course in Bangalore
Data science course in pune | Data Science Course institute in Pune
Data science online course | Online Data Science certification course-Gangboard
Data Science Interview questions and answers
It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command
ReplyDeletepython training institute in marathahalli
python training institute in btm
Python training course in Chennai
Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.
ReplyDeleteangularjs-Training in chennai
angularjs Training in chennai
angularjs-Training in tambaram
angularjs-Training in sholinganallur
angularjs-Training in velachery
" I’d love to be a part of group where I can get advice from other experienced people that share the same interest. If you have any recommendations, please let me know. Thank you.
ReplyDelete"
apple service center chennai | ipod service center in chennai | Apple laptop service center in chennai | apple iphone service center in chennai | apple iphone service center in chennai
Koj tsab xov xwm tseem ceeb heev
ReplyDeletechó Corgi
bán chó Corgi
chó Corgi giá bao nhiêu
mua chó Corgi
There are a lot of occasions to be in contact with the job world (workshops, presentations in front of the company, guest speakers, seminars).Along that blogs is helping through learn on different path to move on different career.
ReplyDeleteLaptop battery replacement in chennai
Laptop unlocking service in chennai
100% genuine laptop parts
Laptop service centre in chennai
Laptop display replacement in chennai
I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing.
ReplyDeleteBecame an Expert In Google Cloud Platform Training in Bangalore! Learn from experienced Trainers and get the knowledge to crack a coding interview, @Softgen Infotech Located in BTM Layout.
Дамская сумочка – поистине предмет искусства. Итальянцы постоянно экперементируют с цветом и материалом, ваяя невероятные сумочки из разных веществ. Однако наряду с такими будоражащими воображение остаются неизменно привычнми традицонные сумочки из кожи, красивые и подкупающие непосредственной элегантностью и высоким стилем. Красивая природная кожа, идеальная фурнитура, многолетние традиции и акцент последних стилей итальянской моды мотивируют проивзодителей саломея сумки на создание всё более идеальных коллекций. Новых трендом стала треугольная кожаная сумочка. Украшенная стильными кругами из медных деталей - будет блистательным акцентом твоего наряда. Выбирайте коричневую, для того, чтобы бысть стильной и заслужить восторженные взоры.
ReplyDeleteNice reading, I love your content. This is really a fantastic and informative post. Keep it up and if you are looking for devops course then visit igmguru.com.
ReplyDeletedata science with python training
Really Nice Information It's Very Helpful All courses Checkout Here.
ReplyDeleteMicrosoft Dynamics CRM Training in Bangalore
Nice to be visiting your blog again, it has been months for me. Well this article that i’ve been waited for so long. I need this article to complete my assignment in the college, and it has same topic with your article. Thanks, great share.
ReplyDeleteGesper666
Resep Ntaps
Texture memory is read from kernels using the device functions described in Texture Functions
ReplyDeletegabung file jpg
I got such a useful stuff on your website that helps me a lot to gain information.
ReplyDeletecheck university ba 1st year time table
will omit your great writing due to this problem.
ReplyDeleteMuleSoft training
MuleSoft online training
Nice blog post for reading and Thanks for sharing the wonderful article
ReplyDeleteInternship providing companies in chennai | Where to do internship | internship opportunities in Chennai | internship offer letter | What internship should i do | How internship works | how many internships should i do ? | internship and inplant training difference | internship guidelines for students | why internship is necessary
ladyhammer casino no depositing bonuses codes
ReplyDeleteslottyway
tornadobet bonus code
dzisiejsze numery keno
This comment has been removed by the author.
ReplyDeleteThis is a great article with lots of informative resources. I appreciate your work this is really helpful for everyone. Check out our website DevOps Training for more igmguru.com, related info!
ReplyDeleteReact.js offers a faster development environment, and its very easy to use. Learn them without taking care of too many setups, when you are developing apps by serving HTML files.
ReplyDeleteLearn How to Use Reactjs Cdn
This comment has been removed by the author.
ReplyDeleteAsk him any Questions about herpes
ReplyDelete[…] It’s a crazy miracle product. Read my review here. […]
I was cured of herpes simplex virus his contact ___________Robinsonbuckler11@gmail. com.......................
this is the best herbal remedy…
I am really really impressed with your writing skills as well as with the layout on your blog.
ReplyDeletebaby night pannel chart, golden day express pannel chart, meenakshi morning pannel chart.
nice...............!
ReplyDeletemulesoft training
pega training
servicenow training training
registered as an official certified guarantee company for over 150 community sites, and is a playground that provides a safe betting environment as a premium Toto site that has been in use without changing its company name for over 6 years.
ReplyDeleteToto without restrictions and unlimited winnings and cash out, just click on the Superstar Bet registration code and address provided.
ReplyDeletepreparing exciting promotions that cannot be experienced in reality and provides the best games in the industry to provide true entertainment and opportunities for winning.
ReplyDeletePlayground boasts the largest domestic capital and is the best real-time mini-game specialized site that provides the highest odds in the industry, 1.96, and provides major site codes.
ReplyDelete