GPU P1000とRTX4060TI 16GBを比較してみました。
P1000 (640CUDA 1.3GHz)→4060TI(4352CUDA 2.5GHz)
にすれば、GPU性能上、
4352/640*2.5/1.3 ≒13
13倍程度の高速化が期待されたのですが、現実は2倍以下の場合もあります。このままでは、使いものになりません。仮にもっとCUDA数の大きなGPUを使ったとしても、現在の伸びでは期待できません。
GPUクロックが固定されていないためか?と一瞬思ったのですが、固定してやっても結果は変わりませんでした。
NVIDIA のGPUのクロックを固定する方法 - pyopyopyo - Linuxとかプログラミングの覚え書き -
D:\test\test_cudaLinear>test D:\test\test_cudaLinear>nvidia-smi -q -d CLOCK ==============NVSMI LOG============== Timestamp : Sat Feb 8 04:26:08 2025 Driver Version : 571.96 CUDA Version : 12.8 Attached GPUs : 1 GPU 00000000:01:00.0 Clocks Graphics : 2805 MHz SM : 2805 MHz Memory : 9001 MHz Video : 2190 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Deferred Clocks Memory : N/A Max Clocks Graphics : 3105 MHz SM : 3105 MHz Memory : 9001 MHz Video : 2415 MHz Max Customer Boost Clocks Graphics : N/A SM Clock Samples Duration : Not Found Number of Samples : Not Found Max : Not Found Min : Not Found Avg : Not Found Memory Clock Samples Duration : Not Found Number of Samples : Not Found Max : Not Found Min : Not Found Avg : Not Found Clock Policy Auto Boost : N/A Auto Boost Default : N/A D:\test\test_cudaLinear>cudalinear -fname n080w8_2_0-4-0-9-1-9-6-2BG2.mps -nIterLim 500000 -ifPre 1 -dPrimalTol 1e-5 num threads= 1 -------------------------------------------------- reading file... n080w8_2_0-4-0-9-1-9-6-2BG2.mps -------------------------------------------------- Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms -------------------------------------------------- running presolve -------------------------------------------------- Presolving model 11734 rows, 20786 cols, 215656 nonzeros 0s 11734 rows, 20693 cols, 215563 nonzeros 0s Presolve status: Reduced Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms Minimize No obj offset -------------------------------------------------- running scaling - use Ruiz scaling - use PC scaling -------------------------------------------------- -------------------------------------------------- enter main solve loop -------------------------------------------------- ____ _ _ ____ ____ _ ____ / ___| | | | _ \| _ \| | | _ \ | | | | | | |_) | | | | | | |_) | | |___| |_| | __/| |_| | |___| __/ \____|\___/|_| |____/|_____|_| Cuda runtime 12060 Cuda driver 12080 cuSparse 12504 Cuda device 0: NVIDIA GeForce RTX 4060 Ti -------------------------------------------------- CUPDHG Parameters: -------------------------------------------------- nIterLim: 500000 dTimeLim (sec): 3600.00 ifScaling: 1 ifRuizScaling: 1 ifL2Scaling: 0 ifPcScaling: 1 eLineSearchMethod: 2 dPrimalTol: 1.0000e-05 dDualTol: 1.0000e-04 dGapTol: 1.0000e-04 dFeasTol: 1.0000e-08 eRestartMethod: 1 -------------------------------------------------- Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 0 +0.00000000e+00 +0.00000000e+00 +0.00e+00 1.03e+02 0.00e+00 0.01s [L] 0 +0.00000000e+00 +0.00000000e+00 +0.00e+00 1.03e+02 0.00e+00 0.01s [A] Termination check: 1.032666e+02|1.042666e-03 0.000000e+00|2.920578e+00 0.000000e+00|1.000000e-04 Termination check: 1.032666e+02|1.042666e-03 0.000000e+00|2.920578e+00 0.000000e+00|1.000000e-04 Last restart was iter 0: average Last restart was iter 1: average Last restart was iter 2: average Last restart was iter 4: average Last restart was iter 7: current Last restart was iter 40: average Last restart was iter 80: average Last restart was iter 160: average Last restart was iter 280: average Last restart was iter 400: current Last restart was iter 480: average Last restart was iter 760: average Last restart was iter 1120: average Last restart was iter 1600: average Last restart was iter 1880: current Last restart was iter 2160: average Last restart was iter 2600: average Last restart was iter 2880: average Last restart was iter 3160: average Last restart was iter 3280: current Last restart was iter 3640: current Last restart was iter 3880: average Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 4000 +3.38816992e+03 +3.38118002e+03 +6.99e+00 3.30e-01 1.65e-02 0.73s [L] 4000 +3.38779443e+03 +3.38934372e+03 -1.55e+00 9.24e-02 7.53e-04 0.73s [A] Termination check: 3.298630e-01|1.042666e-03 1.650873e-02|2.920578e+00 1.032428e-03|1.000000e-04 Termination check: 9.244807e-02|1.042666e-03 7.527463e-04|2.920578e+00 2.285719e-04|1.000000e-04 Last restart was iter 3960: current Last restart was iter 6200: current Last restart was iter 6440: current Last restart was iter 6560: average Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 8000 +3.41615033e+03 +3.41508172e+03 +1.07e+00 1.75e-02 1.02e-04 1.41s [L] 8000 +3.41329596e+03 +3.41716699e+03 -3.87e+00 1.18e-02 9.91e-05 1.41s [A] Termination check: 1.747635e-02|1.042666e-03 1.018131e-04|2.920578e+00 1.564065e-04|1.000000e-04 Termination check: 1.178435e-02|1.042666e-03 9.914102e-05|2.920578e+00 5.666482e-04|1.000000e-04 Last restart was iter 6800: average Last restart was iter 10640: average Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 12000 +3.42440093e+03 +3.42183563e+03 +2.57e+00 6.01e-03 7.29e-05 2.09s [L] 12000 +3.42179821e+03 +3.42244885e+03 -6.51e-01 4.98e-03 3.66e-05 2.09s [A] Termination check: 6.008872e-03|1.042666e-03 7.291813e-05|2.920578e+00 3.746476e-04|1.000000e-04 Termination check: 4.978544e-03|1.042666e-03 3.658292e-05|2.920578e+00 9.504995e-05|1.000000e-04 Last restart was iter 10840: average Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 16000 +3.42948919e+03 +3.42399978e+03 +5.49e+00 2.48e-03 5.15e-05 2.75s [L] 16000 +3.42739936e+03 +3.42399246e+03 +3.41e+00 1.35e-03 3.87e-05 2.75s [A] Termination check: 2.482076e-03|1.042666e-03 5.146503e-05|2.920578e+00 8.008492e-04|1.000000e-04 Termination check: 1.350319e-03|1.042666e-03 3.872877e-05|2.920578e+00 4.971842e-04|1.000000e-04 Last restart was iter 12520: current Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 20000 +3.42816896e+03 +3.42516766e+03 +3.00e+00 2.66e-03 3.41e-05 3.43s [L] 20000 +3.42828181e+03 +3.42525346e+03 +3.03e+00 1.99e-03 2.41e-05 3.43s [A] Termination check: 2.655098e-03|1.042666e-03 3.410679e-05|2.920578e+00 4.378692e-04|1.000000e-04 Termination check: 1.987844e-03|1.042666e-03 2.405062e-05|2.920578e+00 4.418021e-04|1.000000e-04 Last restart was iter 19600: current Last restart was iter 21160: current Last restart was iter 23240: current Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 24000 +3.42643584e+03 +3.42605686e+03 +3.79e-01 4.21e-02 2.07e-05 4.10s [L] 24000 +3.42637363e+03 +3.42612232e+03 +2.51e-01 5.40e-03 4.17e-06 4.10s [A] Termination check: 4.210707e-02|1.042666e-03 2.069015e-05|2.920578e+00 5.529809e-05|1.000000e-04 Termination check: 5.401254e-03|1.042666e-03 4.166442e-06|2.920578e+00 3.666811e-05|1.000000e-04 Last restart was iter 23720: average Last restart was iter 24240: current Last restart was iter 24440: average Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 28000 +3.42565192e+03 +3.42648533e+03 -8.33e-01 8.06e-03 1.51e-06 4.79s [L] 28000 +3.42622676e+03 +3.42650881e+03 -2.82e-01 7.13e-03 1.47e-06 4.79s [A] Termination check: 8.056111e-03|1.042666e-03 1.512344e-06|2.920578e+00 1.216096e-04|1.000000e-04 Termination check: 7.130953e-03|1.042666e-03 1.469720e-06|2.920578e+00 4.115273e-05|1.000000e-04 Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 32000 +3.42568942e+03 +3.42657068e+03 -8.81e-01 7.60e-03 2.20e-06 5.46s [L] 32000 +3.42592677e+03 +3.42661118e+03 -6.84e-01 6.52e-03 1.21e-06 5.46s [A] Termination check: 7.600765e-03|1.042666e-03 2.195129e-06|2.920578e+00 1.285898e-04|1.000000e-04 Termination check: 6.523763e-03|1.042666e-03 1.213275e-06|2.920578e+00 9.986266e-05|1.000000e-04 Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 36000 +3.42595291e+03 +3.42657121e+03 -6.18e-01 4.57e-03 3.24e-06 6.13s [L] 36000 +3.42538203e+03 +3.42667066e+03 -1.29e+00 5.51e-03 6.04e-07 6.13s [A] Termination check: 4.571100e-03|1.042666e-03 3.235627e-06|2.920578e+00 9.021589e-05|1.000000e-04 Termination check: 5.511004e-03|1.042666e-03 6.042666e-07|2.920578e+00 1.880368e-04|1.000000e-04 Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 40000 +3.42722460e+03 +3.42661804e+03 +6.07e-01 2.70e-03 1.28e-06 6.80s [L] 40000 +3.42583599e+03 +3.42669722e+03 -8.61e-01 3.91e-03 4.80e-07 6.80s [A] Termination check: 2.699920e-03|1.042666e-03 1.278393e-06|2.920578e+00 8.848638e-05|1.000000e-04 Termination check: 3.905445e-03|1.042666e-03 4.796320e-07|2.920578e+00 1.256628e-04|1.000000e-04 Last restart was iter 26240: average Last restart was iter 41000: current Last restart was iter 41080: current Last restart was iter 41280: current Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 44000 +3.42547714e+03 +3.42643168e+03 -9.55e-01 3.96e-03 6.86e-06 7.48s [L] 44000 +3.42564450e+03 +3.42657412e+03 -9.30e-01 4.58e-03 3.11e-06 7.48s [A] Termination check: 3.963274e-03|1.042666e-03 6.856812e-06|2.920578e+00 1.392908e-04|1.000000e-04 Termination check: 4.578061e-03|1.042666e-03 3.113830e-06|2.920578e+00 1.356470e-04|1.000000e-04 Last restart was iter 42760: current Last restart was iter 44960: current Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 48000 +3.42722966e+03 +3.42653605e+03 +6.94e-01 1.96e-03 2.07e-06 8.16s [L] 48000 +3.42690306e+03 +3.42656473e+03 +3.38e-01 1.63e-03 1.62e-06 8.16s [A] Termination check: 1.956777e-03|1.042666e-03 2.065282e-06|2.920578e+00 1.011859e-04|1.000000e-04 Termination check: 1.627761e-03|1.042666e-03 1.617831e-06|2.920578e+00 4.935833e-05|1.000000e-04 Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 52000 +3.42717924e+03 +3.42656103e+03 +6.18e-01 1.57e-03 2.70e-06 8.83s [L] 52000 +3.42732935e+03 +3.42665984e+03 +6.70e-01 1.07e-03 3.31e-07 8.83s [A] Termination check: 1.567878e-03|1.042666e-03 2.698537e-06|2.920578e+00 9.018737e-05|1.000000e-04 Termination check: 1.069825e-03|1.042666e-03 3.313637e-07|2.920578e+00 9.766843e-05|1.000000e-04 Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 52400 +3.42691858e+03 +3.42656166e+03 +3.57e-01 1.59e-03 2.21e-06 8.92s [L] 52400 +3.42731388e+03 +3.42666552e+03 +6.48e-01 1.04e-03 3.29e-07 8.92s [A] Solving information: Optimal average solution. Primal objective: +3.42731388e+03 Dual objective: +3.42666552e+03 Primal infeas (abs/rel): 1.04e-03 / 1.00e-05 Dual infeas (abs/rel): 3.29e-07 / 1.13e-11 Duality gap (abs/rel): 6.48e-01 / 9.46e-05 Number of iterations: 52400 Timing information: Total solver time 9.013000e+00 in 52400 iterations Solve time 8.916000e+00 in 52400 iterations Iters per sec 5.877075e+03 Scaling time 1.200000e-02 Presolve time 8.500000e-02 Ax 1.045000e+00 in 53734 calls Aty 1.156000e+00 in 53734 calls ComputeResiduals 0.000000e+00 in 0 calls UpdateIterates 6.884000e+00 in 52400 calls GPU Timing information: CudaPrepare 1.040000e-01 Alloc&CopyMatToDevice 4.000000e-03 CopyVecToDevice 0.000000e+00 DeviceMatVecProd 2.190000e+00 CopyVecToHost 0.000000e+00 -------------------------------- --- saving to ./solution-sum.json -------------------------------- Free Device memory 1.000000e-03 D:\test\test_cudaLinear>cudalinear -fname instance19.mps -nIterLim 500000 -ifPre 1 -dPrimalTol 1e-5 num threads= 1 -------------------------------------------------- reading file... instance19.mps -------------------------------------------------- Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms -------------------------------------------------- running presolve -------------------------------------------------- Presolving model 459 rows, 6083 cols, 254040 nonzeros 0s 459 rows, 6083 cols, 254040 nonzeros 0s Presolve status: Reduced Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms Minimize Has obj offset 300.000000 -------------------------------------------------- running scaling - use Ruiz scaling - use PC scaling -------------------------------------------------- -------------------------------------------------- enter main solve loop -------------------------------------------------- ____ _ _ ____ ____ _ ____ / ___| | | | _ \| _ \| | | _ \ | | | | | | |_) | | | | | | |_) | | |___| |_| | __/| |_| | |___| __/ \____|\___/|_| |____/|_____|_| Cuda runtime 12060 Cuda driver 12080 cuSparse 12504 Cuda device 0: NVIDIA GeForce RTX 4060 Ti -------------------------------------------------- CUPDHG Parameters: -------------------------------------------------- nIterLim: 500000 dTimeLim (sec): 3600.00 ifScaling: 1 ifRuizScaling: 1 ifL2Scaling: 0 ifPcScaling: 1 eLineSearchMethod: 2 dPrimalTol: 1.0000e-05 dDualTol: 1.0000e-04 dGapTol: 1.0000e-04 dFeasTol: 1.0000e-08 eRestartMethod: 1 -------------------------------------------------- Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 0 +3.00000000e+02 +1.20000000e+01 +2.88e+02 1.03e+02 0.00e+00 0.00s [L] 0 +3.00000000e+02 +1.20000000e+01 +2.88e+02 1.03e+02 0.00e+00 0.00s [A] Termination check: 1.026158e+02|1.036158e-03 0.000000e+00|2.070235e-01 9.201278e-01|1.000000e-04 Termination check: 1.026158e+02|1.036158e-03 0.000000e+00|2.070235e-01 9.201278e-01|1.000000e-04 Last restart was iter 0: average Last restart was iter 1: average Last restart was iter 2: average Last restart was iter 4: current Last restart was iter 7: current Last restart was iter 9: average Last restart was iter 40: average Last restart was iter 80: average Last restart was iter 160: average Last restart was iter 280: current Last restart was iter 440: average Last restart was iter 640: current Last restart was iter 720: current Last restart was iter 1040: current Last restart was iter 1200: current Last restart was iter 1720: average Last restart was iter 1800: average Last restart was iter 2120: current Last restart was iter 2560: average Last restart was iter 3000: current Last restart was iter 3120: average Last restart was iter 3240: current Last restart was iter 3320: current Last restart was iter 3360: current Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 4000 +3.13156582e+03 +3.14485627e+03 -1.33e+01 1.92e-01 1.00e-03 0.79s [L] 4000 +3.13510013e+03 +3.14558144e+03 -1.05e+01 1.75e-01 2.23e-04 0.79s [A] Termination check: 1.918230e-01|1.036158e-03 1.002731e-03|2.070235e-01 2.117183e-03|1.000000e-04 Termination check: 1.753267e-01|1.036158e-03 2.229571e-04|2.070235e-01 1.668550e-03|1.000000e-04 Last restart was iter 3440: average Last restart was iter 4480: current Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 8000 +3.15424115e+03 +3.14737552e+03 +6.87e+00 1.92e-02 1.30e-04 1.51s [L] 8000 +3.15309039e+03 +3.14731850e+03 +5.77e+00 1.60e-02 3.07e-05 1.51s [A] Termination check: 1.924401e-02|1.036158e-03 1.297241e-04|2.070235e-01 1.089329e-03|1.000000e-04 Termination check: 1.595192e-02|1.036158e-03 3.073819e-05|2.070235e-01 9.159682e-04|1.000000e-04 Last restart was iter 7000: current Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 12000 +3.14754575e+03 +3.14742372e+03 +1.22e-01 6.13e-03 1.18e-05 2.19s [L] 12000 +3.14850105e+03 +3.14743339e+03 +1.07e+00 2.89e-03 7.55e-06 2.19s [A] Termination check: 6.131199e-03|1.036158e-03 1.181000e-05|2.070235e-01 1.938108e-05|1.000000e-04 Termination check: 2.888755e-03|1.036158e-03 7.552550e-06|2.070235e-01 1.695513e-04|1.000000e-04 Last restart was iter 8840: current Last restart was iter 12040: current Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 16000 +3.14770410e+03 +3.14746170e+03 +2.42e-01 4.13e-03 4.47e-06 2.82s [L] 16000 +3.14659655e+03 +3.14746318e+03 -8.67e-01 1.94e-03 4.05e-06 2.82s [A] Termination check: 4.129900e-03|1.036158e-03 4.469477e-06|2.070235e-01 3.849848e-05|1.000000e-04 Termination check: 1.944635e-03|1.036158e-03 4.050609e-06|2.070235e-01 1.376682e-04|1.000000e-04 Last restart was iter 12240: average Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 20000 +3.14823809e+03 +3.14746977e+03 +7.68e-01 7.36e-03 4.14e-06 3.50s [L] 20000 +3.14774868e+03 +3.14747355e+03 +2.75e-01 2.13e-03 2.26e-06 3.50s [A] Termination check: 7.359316e-03|1.036158e-03 4.136498e-06|2.070235e-01 1.220189e-04|1.000000e-04 Termination check: 2.129193e-03|1.036158e-03 2.259238e-06|2.070235e-01 4.369881e-05|1.000000e-04 Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 24000 +3.14813795e+03 +3.14748888e+03 +6.49e-01 5.32e-03 3.67e-06 4.17s [L] 24000 +3.14844821e+03 +3.14748980e+03 +9.58e-01 3.85e-03 1.14e-06 4.17s [A] Termination check: 5.322694e-03|1.036158e-03 3.667634e-06|2.070235e-01 1.030817e-04|1.000000e-04 Termination check: 3.854022e-03|1.036158e-03 1.136330e-06|2.070235e-01 1.522019e-04|1.000000e-04 Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 28000 +3.14744877e+03 +3.14749012e+03 -4.14e-02 2.09e-03 1.31e-06 4.82s [L] 28000 +3.14800194e+03 +3.14749394e+03 +5.08e-01 2.98e-03 5.80e-07 4.82s [A] Termination check: 2.086090e-03|1.036158e-03 1.307150e-06|2.070235e-01 6.568466e-06|1.000000e-04 Termination check: 2.975282e-03|1.036158e-03 5.795401e-07|2.070235e-01 8.067989e-05|1.000000e-04 Last restart was iter 19160: current Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 30240 +3.14769267e+03 +3.14749172e+03 +2.01e-01 1.01e-03 1.14e-06 5.17s [L] 30240 +3.14773743e+03 +3.14749297e+03 +2.44e-01 1.10e-03 8.47e-07 5.17s [A] Solving information: Optimal current solution. Primal objective: +3.14769267e+03 Dual objective: +3.14749172e+03 Primal infeas (abs/rel): 1.01e-03 / 9.74e-06 Dual infeas (abs/rel): 1.14e-06 / 5.50e-10 Duality gap (abs/rel): 2.01e-01 / 3.19e-05 Number of iterations: 30240 Timing information: Total solver time 5.227000e+00 in 30240 iterations Solve time 5.172000e+00 in 30240 iterations Iters per sec 5.846868e+03 Scaling time 1.400000e-02 Presolve time 4.100000e-02 Ax 5.310000e-01 in 31698 calls Aty 7.340000e-01 in 31698 calls ComputeResiduals 0.000000e+00 in 0 calls UpdateIterates 3.976000e+00 in 30240 calls GPU Timing information: CudaPrepare 8.600000e-02 Alloc&CopyMatToDevice 3.000000e-03 CopyVecToDevice 0.000000e+00 DeviceMatVecProd 1.261000e+00 CopyVecToHost 0.000000e+00 -------------------------------- --- saving to ./solution-sum.json -------------------------------- Free Device memory 1.000000e-03 D:\test\test_cudaLinear>cudalinear -fname instance20.mps -nIterLim 500000 -ifPre 1 -dPrimalTol 1e-5 num threads= 1 -------------------------------------------------- reading file... instance20.mps -------------------------------------------------- Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms -------------------------------------------------- running presolve -------------------------------------------------- Presolving model 1142 rows, 6249 cols, 426542 nonzeros 0s 1142 rows, 6249 cols, 426542 nonzeros 0s Presolve status: Not reduced Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms Minimize No obj offset -------------------------------------------------- running scaling - use Ruiz scaling - use PC scaling -------------------------------------------------- -------------------------------------------------- enter main solve loop -------------------------------------------------- ____ _ _ ____ ____ _ ____ / ___| | | | _ \| _ \| | | _ \ | | | | | | |_) | | | | | | |_) | | |___| |_| | __/| |_| | |___| __/ \____|\___/|_| |____/|_____|_| Cuda runtime 12060 Cuda driver 12080 cuSparse 12504 Cuda device 0: NVIDIA GeForce RTX 4060 Ti -------------------------------------------------- CUPDHG Parameters: -------------------------------------------------- nIterLim: 500000 dTimeLim (sec): 3600.00 ifScaling: 1 ifRuizScaling: 1 ifL2Scaling: 0 ifPcScaling: 1 eLineSearchMethod: 2 dPrimalTol: 1.0000e-05 dDualTol: 1.0000e-04 dGapTol: 1.0000e-04 dFeasTol: 1.0000e-08 eRestartMethod: 1 -------------------------------------------------- Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 0 +0.00000000e+00 +0.00000000e+00 +0.00e+00 1.59e+02 0.00e+00 0.01s [L] 0 +0.00000000e+00 +0.00000000e+00 +0.00e+00 1.59e+02 0.00e+00 0.01s [A] Termination check: 1.587199e+02|1.597199e-03 0.000000e+00|3.319796e-01 0.000000e+00|1.000000e-04 Termination check: 1.587199e+02|1.597199e-03 0.000000e+00|3.319796e-01 0.000000e+00|1.000000e-04 Last restart was iter 0: average Last restart was iter 1: average Last restart was iter 2: average Last restart was iter 4: current Last restart was iter 7: current Last restart was iter 9: average Last restart was iter 40: average Last restart was iter 80: average Last restart was iter 160: average Last restart was iter 280: current Last restart was iter 360: current Last restart was iter 600: current Last restart was iter 920: current Last restart was iter 1000: current Last restart was iter 1160: average Last restart was iter 1240: average Last restart was iter 1760: current Last restart was iter 2200: current Last restart was iter 2760: average Last restart was iter 3120: current Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 4000 +4.79820895e+03 +4.75850224e+03 +3.97e+01 1.21e-01 4.64e-03 0.77s [L] 4000 +4.81498430e+03 +4.75654113e+03 +5.84e+01 6.00e-02 2.47e-03 0.77s [A] Termination check: 1.207468e-01|1.597199e-03 4.636336e-03|3.319796e-01 4.154416e-03|1.000000e-04 Termination check: 6.000623e-02|1.597199e-03 2.468643e-03|3.319796e-01 6.105302e-03|1.000000e-04 Last restart was iter 3320: current Last restart was iter 4280: current Last restart was iter 4320: current Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 7000 +4.76875256e+03 +4.76890483e+03 -1.52e-01 7.64e-03 5.89e-05 1.27s [L] 7000 +4.76922200e+03 +4.76896552e+03 +2.56e-01 9.15e-04 1.47e-05 1.27s [A] Solving information: Optimal average solution. Primal objective: +4.76922200e+03 Dual objective: +4.76896552e+03 Primal infeas (abs/rel): 9.15e-04 / 5.73e-06 Dual infeas (abs/rel): 1.47e-05 / 4.44e-09 Duality gap (abs/rel): 2.56e-01 / 2.69e-05 Number of iterations: 7000 Timing information: Total solver time 1.367000e+00 in 7000 iterations Solve time 1.275000e+00 in 7000 iterations Iters per sec 5.490196e+03 Scaling time 1.800000e-02 Presolve time 7.400000e-02 Ax 1.210000e-01 in 7214 calls Aty 1.380000e-01 in 7214 calls ComputeResiduals 0.000000e+00 in 0 calls UpdateIterates 9.700000e-01 in 7000 calls GPU Timing information: CudaPrepare 7.600000e-02 Alloc&CopyMatToDevice 5.000000e-03 CopyVecToDevice 0.000000e+00 DeviceMatVecProd 2.570000e-01 CopyVecToHost 0.000000e+00 -------------------------------- --- saving to ./solution-sum.json -------------------------------- Free Device memory 0.000000e+00 D:\test\test_cudaLinear>cudalinear -fname instance21.mps -nIterLim 500000 -ifPre 1 -dPrimalTol 1e-5 num threads= 1 -------------------------------------------------- reading file... instance21.mps -------------------------------------------------- Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms -------------------------------------------------- running presolve -------------------------------------------------- Presolving model 1556 rows, 9235 cols, 603232 nonzeros 0s 1556 rows, 9235 cols, 603232 nonzeros 0s Presolve status: Not reduced Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms Minimize No obj offset -------------------------------------------------- running scaling - use Ruiz scaling - use PC scaling -------------------------------------------------- -------------------------------------------------- enter main solve loop -------------------------------------------------- ____ _ _ ____ ____ _ ____ / ___| | | | _ \| _ \| | | _ \ | | | | | | |_) | | | | | | |_) | | |___| |_| | __/| |_| | |___| __/ \____|\___/|_| |____/|_____|_| Cuda runtime 12060 Cuda driver 12080 cuSparse 12504 Cuda device 0: NVIDIA GeForce RTX 4060 Ti -------------------------------------------------- CUPDHG Parameters: -------------------------------------------------- nIterLim: 500000 dTimeLim (sec): 3600.00 ifScaling: 1 ifRuizScaling: 1 ifL2Scaling: 0 ifPcScaling: 1 eLineSearchMethod: 2 dPrimalTol: 1.0000e-05 dDualTol: 1.0000e-04 dGapTol: 1.0000e-04 dFeasTol: 1.0000e-08 eRestartMethod: 1 -------------------------------------------------- Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 0 +0.00000000e+00 +0.00000000e+00 +0.00e+00 2.36e+02 0.00e+00 0.00s [L] 0 +0.00000000e+00 +0.00000000e+00 +0.00e+00 2.36e+02 0.00e+00 0.00s [A] Termination check: 2.360339e+02|2.370339e-03 0.000000e+00|3.955477e-01 0.000000e+00|1.000000e-04 Termination check: 2.360339e+02|2.370339e-03 0.000000e+00|3.955477e-01 0.000000e+00|1.000000e-04 Last restart was iter 0: average Last restart was iter 1: average Last restart was iter 2: average Last restart was iter 4: current Last restart was iter 7: current Last restart was iter 8: current Last restart was iter 9: average Last restart was iter 40: average Last restart was iter 80: average Last restart was iter 160: current Last restart was iter 280: average Last restart was iter 360: current Last restart was iter 440: current Last restart was iter 600: current Last restart was iter 920: current Last restart was iter 1120: current Last restart was iter 1360: current Last restart was iter 1520: average Last restart was iter 1600: current Last restart was iter 2520: current Last restart was iter 2640: current Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 4000 +2.11357849e+04 +2.11302185e+04 +5.57e+00 2.96e-02 2.64e-04 0.83s [L] 4000 +2.11404979e+04 +2.11299275e+04 +1.06e+01 5.08e-03 2.09e-04 0.83s [A] Termination check: 2.963712e-02|2.370339e-03 2.640344e-04|3.955477e-01 1.316966e-04|1.000000e-04 Termination check: 5.084463e-03|2.370339e-03 2.088359e-04|3.955477e-01 2.500615e-04|1.000000e-04 Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time 4440 +2.11156665e+04 +2.11305657e+04 -1.49e+01 2.21e-02 1.84e-04 0.91s [L] 4440 +2.11333909e+04 +2.11305333e+04 +2.86e+00 2.30e-03 1.61e-04 0.91s [A] Solving information: Optimal average solution. Primal objective: +2.11333909e+04 Dual objective: +2.11305333e+04 Primal infeas (abs/rel): 2.30e-03 / 9.70e-06 Dual infeas (abs/rel): 1.61e-04 / 4.06e-08 Duality gap (abs/rel): 2.86e+00 / 6.76e-05 Number of iterations: 4440 Timing information: Total solver time 1.056000e+00 in 4440 iterations Solve time 9.140000e-01 in 4440 iterations Iters per sec 4.857768e+03 Scaling time 2.600000e-02 Presolve time 1.160000e-01 Ax 7.600000e-02 in 4668 calls Aty 7.900000e-02 in 4668 calls ComputeResiduals 0.000000e+00 in 0 calls UpdateIterates 6.920000e-01 in 4440 calls GPU Timing information: CudaPrepare 8.600000e-02 Alloc&CopyMatToDevice 8.000000e-03 CopyVecToDevice 0.000000e+00 DeviceMatVecProd 1.550000e-01 CopyVecToHost 0.000000e+00 -------------------------------- --- saving to ./solution-sum.json -------------------------------- Free Device memory 0.000000e+00 D:\test\test_cudaLinear>
原因として考えられるのは、CPU-GPU間の転送部です。GPUのプログラムは、経験がないので良くわからないのですが、cuPDLPの本体記述によるものと、考えています。現在の実装の中規模以下では、頻繁にCPUと通信することがボトルネックになっていると推察されます。(Google版(CPU版)も見たのですが、cpPDLP(COPT版)の方に歩があると見ています。)
恐らく、超大規模問題では、GPU演算の時間が相対的に主体なので、CPU-GPU転送時間がボトルナックにはならないのではないでしょうか?現在のPDLPのトピックは、超大規模問題であり、Nvidiaもプロモートしています。
NVIDIA cuOpt で大規模な線形計画問題を加速する - NVIDIA 技術ブログ
ちなみに配送最適化問題については、こちらが詳しい
運搬経路問題(配送最適化問題,Vehicle Routing Problem) をPuLPで解く #Python - Qiita
しかし、我々の主な関心は、中大規模問題であり、主に商用のISMソルバの置き換えにあります。商用ISMソルバを使いたくても使えない庶民向けのソルバです。
よって、GPUの力を最大限発揮させるには、cuPDLP本体を記述し直しなおすしかない、という結論になります。
もう一つの問題は、WarmStartのサポートです。Simplexでは、WarmStartの恩恵があるのですが、FirstOrderにおいても、これは可能な筈です。これも現在のcuPDLPは、サポートされていないので、実装を検討する必要があります。
以上2点の実装を行う必要があります。
Highs ISMも組み込んでみたのですが、現在のスケジュールナースの速度の倍程度遅く、仮に将来マルチスレッド化されても、期待の改善度を上回ることはない、と判断しました。一方、FirstOrderは、高精度は、期待できないものの、WarmStartが魅力であり、GPUのスケーラビリティを生かせる可能性もあり、将来性があります。
以上より、Unresolved instances,INRC2 8weeks 2instancesとScheduling Benchmarks 2instances を解く為には、cuPDLPの実装し直しが必要であると結論しました。
Highs Teamが以上の要件を満足するように再実装してくれることを期待したいのですが、待っていられないし、COPTが率先して実装することもあり得ないと思うので、自分で行うことにしました。
0 件のコメント:
コメントを投稿