• List of Articles GPUs

      • Open Access Article

        1 - Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)
        Farshad Khunjush
        Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as on More
        Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metrics that should be taken into consideration in addition to performance.  In spite of this importance, to the best of our knowledge, studies on power consumptions in SPMVs algorithms on GPUs are scarce.  In this paper, we investigate the effects of hardware parameters on power consumptions in SPMV algorithms on GPUs. For this, we leverage the possibility of setting the GPU’s parameters to investigate the effects of these parameters on power consumptions. These configurations have been applied to different formats of Sparse Matrices, and the best parameters are selected for having the best performance per power metric. Therefore, as the results of this study the settings can be applied in running different Linear Algebra algorithms on GPUs to obtain the best performance per power. Manuscript profile
      • Open Access Article

        2 - Improving Register File Access Latency Tolerance in GPUs by Value Reproduction
        Rahil Barati Mohammad Sadrosadati حمید سربازی آزاد
        Large register files reduce the performance and energy overhead of memory accesses by improving the thread-level parallelism and reducing the number of data movements from the off-chip memory. Recently, the latency-tolerant register file (LTRF) is proposed to enable hig More
        Large register files reduce the performance and energy overhead of memory accesses by improving the thread-level parallelism and reducing the number of data movements from the off-chip memory. Recently, the latency-tolerant register file (LTRF) is proposed to enable high-capacity register files with low power and area cost. LTRF is a two-level register file in which the first level is a small fast register cache, and the second level is a large slow main register file. LTRF uses a near-perfect register prefetching mechanism that warp registers are prefetched from the main register file to the register file cache before scheduling the warp and hiding the register prefetching latency by the execution of other active warps. LTRF specifies the working set of the warps by partitioning the control flow graph into several prefetch subgraphs, called register-interval. LTRF imposes some performance overhead due to warp stall during the register prefetching. Reducing the number of register-intervals can greatly mitigate this overhead, and improve the effectiveness of LTRF. A register-interval is a subgraph of the control flow graph (CFG) where it has to be a single-entry subgraph with a limited number of registers. We observe that the second constrain contributes more in reducing the size of register-intervals. Increasing the number of registers inside the register-interval cannot address this problem as it imposes huge performance and power overhead during the register prefetching process. In this paper, we propose a register-interval-aware re-production mechanism at compile-time to increase register-interval size without increasing the number of registers inside it. Our experimental results show that our proposal improves the effectiveness of LTRF by 29%, and LTRF’s performance by about 18% (about 30% improvement over baseline GPU architecture). Moreover, our proposal reduces GPU energy and power consumption by respectively 38% and 15%, on average. Manuscript profile