Optimzation of Knapsack on GPU

2022-02-23

我的第一篇論文「Optimization of Multi-Class 0/1 Knapsack Problem on GPUs by Improving Memory Access Efficiency」已經被 Journal of Supercomputing （IF: 2.60）所接受。

內容主要為將擁有相同重量（或價值）的物品分成一組，透過修改 0/1 背包問題的 DP 轉移式並搭配貪心演算法，增加資料的平行度與重複利用性，進而大幅改善在 GPU 計算上遇到的記憶體瓶頸。以下為論文摘要：

This work aims to improve the GPU performance for solving the 0/1 knapsack problem, which is a well-known combinatorial optimization problem found in many practical applications, including cryptography, financial decision, electronic design automation, computing resource management, etc. The knapsack problem is NP-hard, but it can be solved efficiently by dynamic programming(DP) algorithms in pseudo-polynomial runtime. The DP knapsack algorithm on GPUs has been presented. However, as the modern GPU architecture provides much higher computing throughput than its memory bandwidth, previous work is bounded by the data access time on GPU memory because its CGMA (Compute to Global Memory Access) ratio is 1, which means every computing operation involves one memory access on average. To address the problem, an innovative approach called Multi-Class 0/1 Knapsack Problem (MCKP), whose items can be classified into groups with equal values or weights is proposed in this paper. By reconstructing the DP equations for solving MCKP, it is able to explore data parallelism and reusability across threads. This made it possible to optimize the computation across iterations (i.e., items), and significantly improve the CGMA ratio by 5-fold after exploring the use of GPU shared memory and registers for reused data. We extensively analyze the performance of our approach on two modern GPU models, NVIDIA Tesla V100 and RTX 3070. Compared to the runtime of previous work, our approach achieves up to 8x and 18x speedup on V100 and RTX 3070 respectively, the latter one being a GPU with lower memory bandwidth. In addition, by comparing the two speedups, we found that we are able to achieve more efficient computing usage when the memory bandwidth is limited such as RTX 3070.