2021年10月2日土曜日

Processing Times for First Order Method using SIMD and multiple threads

 Sparse Matrix Library "Eigen" is an excellent library for fast operations in general. However, if SpMV is the bottleneck of the processing time, there is room for improvement. Especially, AVX2 instruction sets make for faster computation time.

Here is a result.

The Eigen in the table is a bare SpMV description. The improved version of SIMD4 is a 128bits operation using SSE4.2 and LUT. 


SIMD8 is a 256bits operation using AVX2; instead of LUT, we use Load with mask instruction.


SIMD8 in the E3 processor is not so good as predicted. However, the latest CPU can expect the full power of AVX2. Thus we can expect almost three times performance gain for multi-threaded Eigen. 

In large scale operation, the first order solver has the advantage of light memory foot print for Simplex or interior point solver thus makes SIMD and multi- threading easily. 

0 件のコメント:

コメントを投稿