Applications and Techniques on the Road to Exascale Computing

Tytuł	Applications and Techniques on the Road to Exascale Computing
Publication Type	Book Chapter
Rok publikacji	2012
Autorzy	Karwacki M., Stpiczyński P
Editor	De Bosschere K., D`Hollander E.H., Joubert G.R., Padna D., Peters F., Sawyer M.
Book Title	Advances in Parallel Computing
Volume	22
Chapter	Improving performance of triangular Matrix-Vector BLAS routines on GPUs.
Publisher	IOS Press
ISBN Number	978-1-61499-040-6
Abstract	CUBLAS is a widely used implementation of BLAS (Basic Linear Algebra Subprograms) for NVIDIA CUDA Graphical Processing Units (GPUs). The aim of this paper is to show that the performance of the selected Level 2 BLAS routines for working with triangular matrices can be improved using some optimization techniques suitable for GPUs like using shared memory and coalesced memory access. We present new implementation of the routines xTRMV and xTRSV. The results of experiments carried out on two GPU architectures: Tesla M2050 and GeForce GTX 260 show that these new implementations are up to 500% faster than corresponding routines from CUBLAS Library.

IITiS