Technologies
Disclosed are various embodiments for computing 2-body statistics on graphics processing units (GPU s ). Various types of two-body statistics (2-BS) are regarded as essential components of data analysis in many scientific and computing domains. However, the quadratic complexity of these computations hinders timely processing of data. According, various embodiments of the present disclosure involve parallel algorithms for 2-BS computation on Graphics Processing Units (GPUs). Although the typical 2-BS problems can be summarized into a straightforward parallel computing pattern, traditional wisdom from (general) parallel computing often falls short in delivering the best possible performance. Therefore, various embodiments of the present disclosure involve techniques to decompose 2-BS problems and methods for effective use of computing resources on GPUs. We also develop analytical models that guide users towards the appropriate parameters of a GPU program. Although 2-BS problems share the same core computations, each 2-BS problem however carries its own characteristics that calls for different strategies in code optimization. Accordingly, various embodiments of the present disclosure involve a software framework that automatically generates high-performance GPU code based on a few parameters and short primer code input.Inventors at USF have developed algorithms which solve the 2-BS problem. The algorithm designed, focuses on effective use of hardware/software features that are unique in GPU platforms. This is done by splitting the algorithm into two stages: pairwise distance function computation and writing output. Then, modifications to the basic algorithm is done by integrating various techniques at each stage. Experiments run on modern GPU hardware show that the algorithms devised, outperform the best known CPU program by at least an order of magnitude in various applications.
Disclosed are various embodiments for performing a join operation using a graphics processing unit (GPU). The GPU can receive input data including sequences or tuples. The GPU can initialize a histogram in a memory location shared by threads. The GPU can build the histogram of hash values for the sequences. The GPU can reorder the sequences based on the histogram. The GPU can probe partitions and store the results in a buffer pool. The GPU can output the results of the join. Computer Science, Software
Disclosed are various embodiments for GPU-based parallel indexing for concurrent spatial queries. A number of nodes in a tree to be partitioned is determined. The tree is then iteratively partitioned with the GPU. Nodes are created with the GPU. Finally, a point insertion is performed using the GPU.USF inventors have developed a construction tree algorithm in G-PICS. Experimental results show performance boosted up to 50X, in both throughput and query response times, over best known parallel GPU and parallel CPU-based spatial query processing systems. Furthermore, the G-PICS design can be easily extended to index datasets that are too large to be placed in GPU global memory. Applications such as geographic information systems (GIS), mobile computing, scientific computing, epidemic simulation, astrophysics may be benefited by this algorithm.