RSS GPGPU

http://www.gpgpu.org/cgi-bin/blosxom.cgi

General Purpose Computation Using Graphics Hardware

Last checked about 1 month ago.

Feed frequency

post frequency (last month)

PostRank™ filter

latest 15 posts

« older items




Sunday May 25th, 2008

Relational Joins on Graphics Processors

From GPGPU, 4 months ago, 0 comments Comment

Abstract: "We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). Taking advantage of GPU features, we design a set of data-parallel primitives such as split and sort, and use these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. We have implemented our algorithms on a PC with an NVIDIA G80 GPU and an Intel quad-core CPU. Our GPU-based join algorithms are able to achieve a performance improvement of 2-7X over their optimized CPU-based counterparts. (Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. Relational Joins on Graphics Processors. ACM SIGMOD 2008.)

A SIMD interpreter for Genetic Programming on GPU Graphics Cards

From GPGPU, 4 months ago, 0 comments Comment

Abstract: Mackey-Glass chaotic time series prediction and nuclear protein classification show the feasibility of evaluating genetic programming populations directly on parallel consumer gaming graphics processing units. Using a Linux KDE computer equipped with an nVidia GeForce 8800 GTX graphics processing unit card the C++ SPMD interpretter evolves programs at Giga GP operations per second (895 million GPops). We use the RapidMind general processing on GPU (GPGPU) framework to evaluate an entire population of a quarter of a million individual programs on a non-trivial problem in 4 seconds. An efficient reverse polish notation (RPN) tree based GP is given. (A SIMD interpreter for Genetic Programming on GPU Graphics Cards. W.B. Langdon and W. Banzhaf. In M. Neill, L. Vanneschi, A.I. Esparcia Alcazar, S. Gustafson eds., EuroGP 2008, pp73-85. Springer, LNCS 4971, 26-28 March, Naples.)

GPGPU Based Image Segmentation Livewire Algorithm Implementation

From GPGPU, 4 months ago, 0 comments Comment

This thesis presents a GPU implementation of the Livewire algorithm. The algorithm is divided in three phases: Sobel or Laplacian filter convolution, image modeling as a grid graph and solving the non-negative weighted edges single-source shortest path problem. In order to calculate the shortest path, an adapted version of the delta-stepping algorithm was developed for GPUs, using CUDA. A critical result analysis shows that intense speedups are seen in image filtering algorithms. On the other hand, the wide use of dependent device memory look-ups has constrained delta-stepping algorithm from achieving higher performance than CPU implementation although a better performance is expected for wider graphs. Besides showing the viability of the Livewire algorithm implementation, this thesis makes available an open-source image segmentation GPU based application, which can be used as example for future GPU algorithm implementations at http://code.google.com/p/gpuwire/.

Quantum Chemistry on GPUs

From GPGPU, 4 months ago, 0 comments Comment

Ivan Ufimtsev and Todd Martínez at the University of Illinois at Urbana-Champaign have implemented an efficient method of calculating two-electron repulsion integrals over Gaussian basis functions on the GPU. Virtually all modern quantum chemical calculations require evaluating millions to billions of these integrals. This problem turns out to be well-suited to the massively parallel architecture of GPUs by an appropriate partitioning of the problem. A benchmark test performed for the evaluation of approximately one million (ss|ss) integrals over contracted s-orbitals showed that a naïve algorithm implemented on the GPU achieves up to 130-fold speedup over a traditional CPU implementation on an AMD Opteron. Subsequent calculations on a 256-atom DNA strand show that the GPU advantage is maintained for basis sets including higher angular momentum functions. (Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation, Ivan S. Ufimtsev and Todd J. Martínez, J. Chem. Theory Comput., 4 (2), 222 -231, 2008. doi:10.1021/ct700268q)

A Flexible Kernel for Adaptive Mesh Refinement on GPU

From GPGPU, 4 months ago, 0 comments Comment

This paper by Boubekeur (TU Berlin) and Schlick (INRIA) presents a flexible GPU kernel for adaptive on-the-fly refinement of meshes with arbitrary topology. By simply reserving a small amount of GPU memory to store a set of adaptive refinement patterns, on-the-fly refinement is performed by the GPU, without any preprocessing or additional topology data structure. The level of adaptive refinement can be controlled by specifying a per-vertex depth tag, in addition to usual position, normal, color and texture coordinates. This depth tag is used by the kernel to instanciate the correct refinement pattern. Finally, the refined patch produced for each triangle can be displaced by the vertex shader, using any kind of geometric refinement, such as Bezier patch smoothing, scalar valued displacement, procedural geometry synthesis or subdivision surfaces. This refinement engine requires no multi-pass rendering, fragment processing, or special preprocessing of the input mesh structure. It can be implemented on any GPU with vertex shading capabilities. (A Flexible Kernel for Adaptive Mesh Refinement on GPU, Tamy Boubekeur and Christophe Schlick, Computer Graphics Forum, 2008.)

Accelerating Resolution-of-the-Identity Second-Order Møller-Plesset Quantum Chemistry Calculations with Graphical Processing Units

From GPGPU, 4 months ago, 0 comments Comment

In this paper we describe a modification of a general purpose code for quantum mechanical calculations of molecular properties (Q-Chem) to use a graphical processing unit. We report a 4.3x speedup of the resolution-of-the-identity second-order Møller-Plesset perturbation theory execution time for single point energy calculation of linear alkanes. Furthermore, we obtain the correlation and total energy for n-octane conformers as the torsional angle of central bond is rotated to show that precision is not lost for these types of calculations. This code modification is accomplished using the NVIDIA CUDA Basic Linear Algebra Subprograms (CUBLAS) library for an NVIDIA Quadro FX 5600 graphics card. Finally, we anticipate further speedups of other matrix algebra based electronic structure calculations using a similar approach. (Accelerating Resolution-of-the-Identity Second-Order Møller-Plesset Quantum Chemistry Calculations with Graphical Processing Units. Vogt, L., Olivares-Amaya, R., Kermes, S., Shao, Y., Amador-Bedolla, C., and Aspuru-Guzik, A. J. Phys. Chem. A, 2008, DOI: 10.1021/jp0776762)

Microprocessor Report: Parallel Processing With CUDA

From GPGPU, 4 months ago, 0 comments Comment

This article in the January 28, 2008 issue of Microprocessor Report discusses parallel computing with massive multiprocessing on GPUs using NVIDIA CUDA. While the full article requires a subscription, a summary is available here.

GPGPUs: Neat Idea or Disruptive Technology?

From GPGPU, 4 months ago, 0 comments Comment

"General purpose graphics processing units can perform amazingly well when used effectively." This article by Rob Farber at Scientific Computing provides a brief high-level discussion of GPGPU and NVIDIA CUDA.

Multiscale and local search methods for real time region tracking with particle filters: local search driven by adaptive scale estimation on GPUs

From GPGPU, 4 months ago, 0 comments Comment

This paper by Cabido et al. presents a real-time object tracking algorithm, based on the hybridization of particle filtering (PF) and a multi-scale local search (MSLS) algorithm, for both CPU and GPU architectures. The developed system provides successful results in precise tracking of single and multiple targets in monocular video, operating in real-time at 70 frames per second for 640 × 480 video resolutions on the GPU, up to 1,100% faster than the CPU version of the algorithm. (Multiscale and local search methods for real time region tracking with particle filters: local search driven by adaptive scale estimation on GPUs. Raul Cabido, Antonio S. Montemayor, Juan Jose Pantrigo, and Bryson R. Payne. Machine Vision and Applications, Springer, 2008.)

CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment

From GPGPU, 4 months ago, 0 comments Comment

The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two biological sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. This paper by Svetlin Manavski and Giorgio Valle describes SmithWaterman-CUDA, an open-source project to perform fast sequence alignment on the GPU. Although the software performs the optimal Smith-Waterman alignment it is faster than heuristics approaches like FASTA and BLAST. The tests on protein data banks show up to 30x speed up related to reference CPU implementations. (Svetlin A. Manavski, Giorgio Valle, CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment, BMC Bioinformatics 2008, 9(Suppl 2):S10 (26 March 2008))

gDEBugger V4.0 Adds Linux Support and a Buffer Viewer

From GPGPU, 4 months ago, 0 comments Comment

The new gDEBugger V4.0 introduces gDEBugger Linux. This new exciting product adds 32-bit and 64-bit Linux Support, bringing all of gDEBugger's debugging and profiling abilities to the Linux OpenGL developers' world. A new Texture and Buffer Viewer has been added. This Viewer allows you to view textures, static buffers and pbuffers as images or raw data in its original format, including non-RGB data formats (float, depth, integer, luminance, etc). This version also includes significant performance improvements. gDEBugger, an OpenGL and OpenGL ES debugger and profiler, traces application activity on top of the OpenGL API to let programmers see what is happening within the graphics system implementation to find bugs and optimize OpenGL application performance. (http://www.gremedy.com)

SHARCNET Symposium on GPU and CELL Computing

From GPGPU, 4 months ago, 0 comments Comment

University of Waterloo
Waterloo, Ontario, Canada
May 27th 2008

This one-day symposium will explore the use of GPUs, CELL processors, FPGAs and multi-core CPUs for large-scale scientific computing. The symposium program includes invited talks on the LANL Roadrunner CELL supercomputer, the RapidMind platform for multicore CPUs and many-core accelerators, and NVIDIA CUDA. For more information, see http://www.sharcnet.ca/events/ssgc2008/

GPU acceleration of cutoff pair potentials for molecular modeling applications

From GPGPU, 4 months ago, 0 comments Comment

The advent of systems biology requires the simulation of ever-larger biomolecular systems, demanding a commensurate growth in computational power. This paper examines the use of the NVIDIA Tesla C870 graphics card programmed through the CUDA toolkit to accelerate the calculation of cutoff pair potentials, one of the most prevalent computations required by many different molecular modeling applications. The paper presents algorithms to calculate electrostatic potential maps for cutoff pair potentials. Whereas a straightforward approach for decomposing atom data leads to low computational efficiency, a new strategy enables fine-grained spatial decomposition of atom data that maps efficiently to the C870's memory system while increasing work efficiency of atom data traversal by a factor of 5. The memory addressing flexibility exposed through CUDA's SPMD programming model is crucial in enabling this new strategy. An implementation of the new algorithm provides a greater than threefold performance improvement over our previously published implementation and runs 12 to 20 times faster than optimized CPU-only code. The lessons learned are generally applicable to algorithms accelerated by uniform grid spatial decomposition. (C. I. Rodrigues, D. J. Hardy, J. E. Stone, K. Schulten, W. W. Hwu., GPU acceleration of cutoff pair potentials for molecular modeling applications. Proceedings of the 2008 Conference On Computing Frontiers, pp.273-282, 2008.) (http://www.ks.uiuc.edu/Research/gpu/)

GPU Computing

From GPGPU, 4 months ago, 0 comments Comment

Abstract: "The graphics processing unit (GPU) has become an integral part of today's mainstream computing systems. Over the past six years, there has been a marked increase in the performance and capabilities of GPUs. The modern GPU is not only a powerful graphics engine but also a highly parallel programmable processor featuring peak arithmetic and memory andwidth that substantially outpaces its CPU counterpart. The GPU's rapid increase in both programmability and capability has spawned a research community that has successfully mapped a broad range of computationally demanding, complex problems to the GPU. This effort in general-purpose computing on the GPU, also known as GPU computing, has positioned the GPU as a compelling alternative to traditional microprocessors in high-performance computer systems of the future. We describe the background, hardware, and programming model for GPU computing, summarize the state of the art in tools and techniques, and present four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications. (J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, J. C. Phillips, "GPU Computing", Proceedings of the IEEE, vol.96, no.5, pp.879-899, May 2008)

CIGPU 5 June 2008 Hong Kong additional technical discussion

From GPGPU, 4 months ago, 0 comments Comment

In addition to the papers already announced, Dr. Simon Harding (Memorial University, Newfoundland) and Dr. Tien-Tsin Wong (The Chinese University of Hong Kong) will lead a discussion on the practicalities of running evolution on modern graphics cards. They will contrast the current leading GPGPU tools considering ease of use, and support for debugging and performance monitoring. CIGPU will close with a short session considering the future of computational intelligence on GPUs.

« older items