The LU factorization is an important numerical algorithm for solving system of linear equations. This paper proposes a novel approach for computing the LU factorization in parallel on multicore architectures. It improves the overall performance and also achieves the numerical quality of the standard LU factorization with partial pivoting. While the update of the trailing submatrix is computationally intensive and highly parallel, the inherently problematic portion of the LU factorization is the panel factorization due to its memory-bound characteristic and the atomicity of selecting the appropriate pivots. We remedy this in our new approach to LU factorization of (narrow and tall) panel submatrices. We use a parallel fine-grained recursive formulation of the factorization. It is based on conflict-free partitioning of the data and lock-less synchronization mechanisms. Our implementation lets the overall computation naturally flow with limited contention. Our recursive panel factorization provides the necessary performance increase for the inherently problematic portion of the LU factorization of square matrices. A large panel width results in larger Amdahl's fraction as our experiments have revealed which is consistent with related efforts. The performance results of our implementation reveal superlinear speedup and far exceed what can be achieved with equivalent MKL and/or LAPACK routines. © 2012 The authors and IOS Press. All rights reserved.
ASJC Scopus subject areas
- Computer Science(all)