GPU P1000とRTX4060TI 16GBを比較してみました。
P1000 (640CUDA 1.3GHz)→4060TI(4352CUDA 2.5GHz)
にすれば、GPU性能上、
4352/640*2.5/1.3 ≒13
13倍程度の高速化が期待されたのですが、現実は2倍以下の場合もあります。このままでは、使いものになりません。仮にもっとCUDA数の大きなGPUを使ったとしても、現在の伸びでは期待できません。
cPDLPCPU cuPDLP(Quadro P1000) CLP cuPDLP(RTX4060TI16GB)
n080w8_2_0-4-0-9-1-9-6-2 47.7sec 14.3sec 76sec 9sec
instance19 46.7sec 21.6sec 2.6sec 5sec
instance20 14.8sec 6.7sec 9.6sec 1sec
instance21 41.5sec 16.2sec 78sec 1sec
傾向としては、
■Iteration数がバラつく。試行毎でもバラつきます。
■規模が大きくなるにつれて、改善傾向(高速化度)が強まる
■大規模問題(instance21)での結果は、期待通り
GPUクロックが固定されていないためか?と一瞬思ったのですが、固定してやっても結果は変わりませんでした。
NVIDIA のGPUのクロックを固定する方法 - pyopyopyo - Linuxとかプログラミングの覚え書き -
D:\test\test_cudaLinear>test
D:\test\test_cudaLinear>nvidia-smi -q -d CLOCK
==============NVSMI LOG==============
Timestamp : Sat Feb 8 04:26:08 2025
Driver Version : 571.96
CUDA Version : 12.8
Attached GPUs : 1
GPU 00000000:01:00.0
Clocks
Graphics : 2805 MHz
SM : 2805 MHz
Memory : 9001 MHz
Video : 2190 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 3105 MHz
SM : 3105 MHz
Memory : 9001 MHz
Video : 2415 MHz
Max Customer Boost Clocks
Graphics : N/A
SM Clock Samples
Duration : Not Found
Number of Samples : Not Found
Max : Not Found
Min : Not Found
Avg : Not Found
Memory Clock Samples
Duration : Not Found
Number of Samples : Not Found
Max : Not Found
Min : Not Found
Avg : Not Found
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
D:\test\test_cudaLinear>cudalinear -fname n080w8_2_0-4-0-9-1-9-6-2BG2.mps -nIterLim 500000 -ifPre 1 -dPrimalTol 1e-5
num threads= 1
--------------------------------------------------
reading file...
n080w8_2_0-4-0-9-1-9-6-2BG2.mps
--------------------------------------------------
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
--------------------------------------------------
running presolve
--------------------------------------------------
Presolving model
11734 rows, 20786 cols, 215656 nonzeros 0s
11734 rows, 20693 cols, 215563 nonzeros 0s
Presolve status: Reduced
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
Minimize
No obj offset
--------------------------------------------------
running scaling
- use Ruiz scaling
- use PC scaling
--------------------------------------------------
--------------------------------------------------
enter main solve loop
--------------------------------------------------
____ _ _ ____ ____ _ ____
/ ___| | | | _ \| _ \| | | _ \
| | | | | | |_) | | | | | | |_) |
| |___| |_| | __/| |_| | |___| __/
\____|\___/|_| |____/|_____|_|
Cuda runtime 12060
Cuda driver 12080
cuSparse 12504
Cuda device 0: NVIDIA GeForce RTX 4060 Ti
--------------------------------------------------
CUPDHG Parameters:
--------------------------------------------------
nIterLim: 500000
dTimeLim (sec): 3600.00
ifScaling: 1
ifRuizScaling: 1
ifL2Scaling: 0
ifPcScaling: 1
eLineSearchMethod: 2
dPrimalTol: 1.0000e-05
dDualTol: 1.0000e-04
dGapTol: 1.0000e-04
dFeasTol: 1.0000e-08
eRestartMethod: 1
--------------------------------------------------
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
0 +0.00000000e+00 +0.00000000e+00 +0.00e+00 1.03e+02 0.00e+00 0.01s [L]
0 +0.00000000e+00 +0.00000000e+00 +0.00e+00 1.03e+02 0.00e+00 0.01s [A]
Termination check: 1.032666e+02|1.042666e-03 0.000000e+00|2.920578e+00 0.000000e+00|1.000000e-04
Termination check: 1.032666e+02|1.042666e-03 0.000000e+00|2.920578e+00 0.000000e+00|1.000000e-04
Last restart was iter 0: average
Last restart was iter 1: average
Last restart was iter 2: average
Last restart was iter 4: average
Last restart was iter 7: current
Last restart was iter 40: average
Last restart was iter 80: average
Last restart was iter 160: average
Last restart was iter 280: average
Last restart was iter 400: current
Last restart was iter 480: average
Last restart was iter 760: average
Last restart was iter 1120: average
Last restart was iter 1600: average
Last restart was iter 1880: current
Last restart was iter 2160: average
Last restart was iter 2600: average
Last restart was iter 2880: average
Last restart was iter 3160: average
Last restart was iter 3280: current
Last restart was iter 3640: current
Last restart was iter 3880: average
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
4000 +3.38816992e+03 +3.38118002e+03 +6.99e+00 3.30e-01 1.65e-02 0.73s [L]
4000 +3.38779443e+03 +3.38934372e+03 -1.55e+00 9.24e-02 7.53e-04 0.73s [A]
Termination check: 3.298630e-01|1.042666e-03 1.650873e-02|2.920578e+00 1.032428e-03|1.000000e-04
Termination check: 9.244807e-02|1.042666e-03 7.527463e-04|2.920578e+00 2.285719e-04|1.000000e-04
Last restart was iter 3960: current
Last restart was iter 6200: current
Last restart was iter 6440: current
Last restart was iter 6560: average
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
8000 +3.41615033e+03 +3.41508172e+03 +1.07e+00 1.75e-02 1.02e-04 1.41s [L]
8000 +3.41329596e+03 +3.41716699e+03 -3.87e+00 1.18e-02 9.91e-05 1.41s [A]
Termination check: 1.747635e-02|1.042666e-03 1.018131e-04|2.920578e+00 1.564065e-04|1.000000e-04
Termination check: 1.178435e-02|1.042666e-03 9.914102e-05|2.920578e+00 5.666482e-04|1.000000e-04
Last restart was iter 6800: average
Last restart was iter 10640: average
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
12000 +3.42440093e+03 +3.42183563e+03 +2.57e+00 6.01e-03 7.29e-05 2.09s [L]
12000 +3.42179821e+03 +3.42244885e+03 -6.51e-01 4.98e-03 3.66e-05 2.09s [A]
Termination check: 6.008872e-03|1.042666e-03 7.291813e-05|2.920578e+00 3.746476e-04|1.000000e-04
Termination check: 4.978544e-03|1.042666e-03 3.658292e-05|2.920578e+00 9.504995e-05|1.000000e-04
Last restart was iter 10840: average
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
16000 +3.42948919e+03 +3.42399978e+03 +5.49e+00 2.48e-03 5.15e-05 2.75s [L]
16000 +3.42739936e+03 +3.42399246e+03 +3.41e+00 1.35e-03 3.87e-05 2.75s [A]
Termination check: 2.482076e-03|1.042666e-03 5.146503e-05|2.920578e+00 8.008492e-04|1.000000e-04
Termination check: 1.350319e-03|1.042666e-03 3.872877e-05|2.920578e+00 4.971842e-04|1.000000e-04
Last restart was iter 12520: current
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
20000 +3.42816896e+03 +3.42516766e+03 +3.00e+00 2.66e-03 3.41e-05 3.43s [L]
20000 +3.42828181e+03 +3.42525346e+03 +3.03e+00 1.99e-03 2.41e-05 3.43s [A]
Termination check: 2.655098e-03|1.042666e-03 3.410679e-05|2.920578e+00 4.378692e-04|1.000000e-04
Termination check: 1.987844e-03|1.042666e-03 2.405062e-05|2.920578e+00 4.418021e-04|1.000000e-04
Last restart was iter 19600: current
Last restart was iter 21160: current
Last restart was iter 23240: current
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
24000 +3.42643584e+03 +3.42605686e+03 +3.79e-01 4.21e-02 2.07e-05 4.10s [L]
24000 +3.42637363e+03 +3.42612232e+03 +2.51e-01 5.40e-03 4.17e-06 4.10s [A]
Termination check: 4.210707e-02|1.042666e-03 2.069015e-05|2.920578e+00 5.529809e-05|1.000000e-04
Termination check: 5.401254e-03|1.042666e-03 4.166442e-06|2.920578e+00 3.666811e-05|1.000000e-04
Last restart was iter 23720: average
Last restart was iter 24240: current
Last restart was iter 24440: average
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
28000 +3.42565192e+03 +3.42648533e+03 -8.33e-01 8.06e-03 1.51e-06 4.79s [L]
28000 +3.42622676e+03 +3.42650881e+03 -2.82e-01 7.13e-03 1.47e-06 4.79s [A]
Termination check: 8.056111e-03|1.042666e-03 1.512344e-06|2.920578e+00 1.216096e-04|1.000000e-04
Termination check: 7.130953e-03|1.042666e-03 1.469720e-06|2.920578e+00 4.115273e-05|1.000000e-04
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
32000 +3.42568942e+03 +3.42657068e+03 -8.81e-01 7.60e-03 2.20e-06 5.46s [L]
32000 +3.42592677e+03 +3.42661118e+03 -6.84e-01 6.52e-03 1.21e-06 5.46s [A]
Termination check: 7.600765e-03|1.042666e-03 2.195129e-06|2.920578e+00 1.285898e-04|1.000000e-04
Termination check: 6.523763e-03|1.042666e-03 1.213275e-06|2.920578e+00 9.986266e-05|1.000000e-04
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
36000 +3.42595291e+03 +3.42657121e+03 -6.18e-01 4.57e-03 3.24e-06 6.13s [L]
36000 +3.42538203e+03 +3.42667066e+03 -1.29e+00 5.51e-03 6.04e-07 6.13s [A]
Termination check: 4.571100e-03|1.042666e-03 3.235627e-06|2.920578e+00 9.021589e-05|1.000000e-04
Termination check: 5.511004e-03|1.042666e-03 6.042666e-07|2.920578e+00 1.880368e-04|1.000000e-04
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
40000 +3.42722460e+03 +3.42661804e+03 +6.07e-01 2.70e-03 1.28e-06 6.80s [L]
40000 +3.42583599e+03 +3.42669722e+03 -8.61e-01 3.91e-03 4.80e-07 6.80s [A]
Termination check: 2.699920e-03|1.042666e-03 1.278393e-06|2.920578e+00 8.848638e-05|1.000000e-04
Termination check: 3.905445e-03|1.042666e-03 4.796320e-07|2.920578e+00 1.256628e-04|1.000000e-04
Last restart was iter 26240: average
Last restart was iter 41000: current
Last restart was iter 41080: current
Last restart was iter 41280: current
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
44000 +3.42547714e+03 +3.42643168e+03 -9.55e-01 3.96e-03 6.86e-06 7.48s [L]
44000 +3.42564450e+03 +3.42657412e+03 -9.30e-01 4.58e-03 3.11e-06 7.48s [A]
Termination check: 3.963274e-03|1.042666e-03 6.856812e-06|2.920578e+00 1.392908e-04|1.000000e-04
Termination check: 4.578061e-03|1.042666e-03 3.113830e-06|2.920578e+00 1.356470e-04|1.000000e-04
Last restart was iter 42760: current
Last restart was iter 44960: current
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
48000 +3.42722966e+03 +3.42653605e+03 +6.94e-01 1.96e-03 2.07e-06 8.16s [L]
48000 +3.42690306e+03 +3.42656473e+03 +3.38e-01 1.63e-03 1.62e-06 8.16s [A]
Termination check: 1.956777e-03|1.042666e-03 2.065282e-06|2.920578e+00 1.011859e-04|1.000000e-04
Termination check: 1.627761e-03|1.042666e-03 1.617831e-06|2.920578e+00 4.935833e-05|1.000000e-04
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
52000 +3.42717924e+03 +3.42656103e+03 +6.18e-01 1.57e-03 2.70e-06 8.83s [L]
52000 +3.42732935e+03 +3.42665984e+03 +6.70e-01 1.07e-03 3.31e-07 8.83s [A]
Termination check: 1.567878e-03|1.042666e-03 2.698537e-06|2.920578e+00 9.018737e-05|1.000000e-04
Termination check: 1.069825e-03|1.042666e-03 3.313637e-07|2.920578e+00 9.766843e-05|1.000000e-04
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
52400 +3.42691858e+03 +3.42656166e+03 +3.57e-01 1.59e-03 2.21e-06 8.92s [L]
52400 +3.42731388e+03 +3.42666552e+03 +6.48e-01 1.04e-03 3.29e-07 8.92s [A]
Solving information: Optimal average solution.
Primal objective: +3.42731388e+03
Dual objective: +3.42666552e+03
Primal infeas (abs/rel): 1.04e-03 / 1.00e-05
Dual infeas (abs/rel): 3.29e-07 / 1.13e-11
Duality gap (abs/rel): 6.48e-01 / 9.46e-05
Number of iterations: 52400
Timing information:
Total solver time 9.013000e+00 in 52400 iterations
Solve time 8.916000e+00 in 52400 iterations
Iters per sec 5.877075e+03
Scaling time 1.200000e-02
Presolve time 8.500000e-02
Ax 1.045000e+00 in 53734 calls
Aty 1.156000e+00 in 53734 calls
ComputeResiduals 0.000000e+00 in 0 calls
UpdateIterates 6.884000e+00 in 52400 calls
GPU Timing information:
CudaPrepare 1.040000e-01
Alloc&CopyMatToDevice 4.000000e-03
CopyVecToDevice 0.000000e+00
DeviceMatVecProd 2.190000e+00
CopyVecToHost 0.000000e+00
--------------------------------
--- saving to ./solution-sum.json
--------------------------------
Free Device memory 1.000000e-03
D:\test\test_cudaLinear>cudalinear -fname instance19.mps -nIterLim 500000 -ifPre 1 -dPrimalTol 1e-5
num threads= 1
--------------------------------------------------
reading file...
instance19.mps
--------------------------------------------------
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
--------------------------------------------------
running presolve
--------------------------------------------------
Presolving model
459 rows, 6083 cols, 254040 nonzeros 0s
459 rows, 6083 cols, 254040 nonzeros 0s
Presolve status: Reduced
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
Minimize
Has obj offset 300.000000
--------------------------------------------------
running scaling
- use Ruiz scaling
- use PC scaling
--------------------------------------------------
--------------------------------------------------
enter main solve loop
--------------------------------------------------
____ _ _ ____ ____ _ ____
/ ___| | | | _ \| _ \| | | _ \
| | | | | | |_) | | | | | | |_) |
| |___| |_| | __/| |_| | |___| __/
\____|\___/|_| |____/|_____|_|
Cuda runtime 12060
Cuda driver 12080
cuSparse 12504
Cuda device 0: NVIDIA GeForce RTX 4060 Ti
--------------------------------------------------
CUPDHG Parameters:
--------------------------------------------------
nIterLim: 500000
dTimeLim (sec): 3600.00
ifScaling: 1
ifRuizScaling: 1
ifL2Scaling: 0
ifPcScaling: 1
eLineSearchMethod: 2
dPrimalTol: 1.0000e-05
dDualTol: 1.0000e-04
dGapTol: 1.0000e-04
dFeasTol: 1.0000e-08
eRestartMethod: 1
--------------------------------------------------
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
0 +3.00000000e+02 +1.20000000e+01 +2.88e+02 1.03e+02 0.00e+00 0.00s [L]
0 +3.00000000e+02 +1.20000000e+01 +2.88e+02 1.03e+02 0.00e+00 0.00s [A]
Termination check: 1.026158e+02|1.036158e-03 0.000000e+00|2.070235e-01 9.201278e-01|1.000000e-04
Termination check: 1.026158e+02|1.036158e-03 0.000000e+00|2.070235e-01 9.201278e-01|1.000000e-04
Last restart was iter 0: average
Last restart was iter 1: average
Last restart was iter 2: average
Last restart was iter 4: current
Last restart was iter 7: current
Last restart was iter 9: average
Last restart was iter 40: average
Last restart was iter 80: average
Last restart was iter 160: average
Last restart was iter 280: current
Last restart was iter 440: average
Last restart was iter 640: current
Last restart was iter 720: current
Last restart was iter 1040: current
Last restart was iter 1200: current
Last restart was iter 1720: average
Last restart was iter 1800: average
Last restart was iter 2120: current
Last restart was iter 2560: average
Last restart was iter 3000: current
Last restart was iter 3120: average
Last restart was iter 3240: current
Last restart was iter 3320: current
Last restart was iter 3360: current
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
4000 +3.13156582e+03 +3.14485627e+03 -1.33e+01 1.92e-01 1.00e-03 0.79s [L]
4000 +3.13510013e+03 +3.14558144e+03 -1.05e+01 1.75e-01 2.23e-04 0.79s [A]
Termination check: 1.918230e-01|1.036158e-03 1.002731e-03|2.070235e-01 2.117183e-03|1.000000e-04
Termination check: 1.753267e-01|1.036158e-03 2.229571e-04|2.070235e-01 1.668550e-03|1.000000e-04
Last restart was iter 3440: average
Last restart was iter 4480: current
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
8000 +3.15424115e+03 +3.14737552e+03 +6.87e+00 1.92e-02 1.30e-04 1.51s [L]
8000 +3.15309039e+03 +3.14731850e+03 +5.77e+00 1.60e-02 3.07e-05 1.51s [A]
Termination check: 1.924401e-02|1.036158e-03 1.297241e-04|2.070235e-01 1.089329e-03|1.000000e-04
Termination check: 1.595192e-02|1.036158e-03 3.073819e-05|2.070235e-01 9.159682e-04|1.000000e-04
Last restart was iter 7000: current
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
12000 +3.14754575e+03 +3.14742372e+03 +1.22e-01 6.13e-03 1.18e-05 2.19s [L]
12000 +3.14850105e+03 +3.14743339e+03 +1.07e+00 2.89e-03 7.55e-06 2.19s [A]
Termination check: 6.131199e-03|1.036158e-03 1.181000e-05|2.070235e-01 1.938108e-05|1.000000e-04
Termination check: 2.888755e-03|1.036158e-03 7.552550e-06|2.070235e-01 1.695513e-04|1.000000e-04
Last restart was iter 8840: current
Last restart was iter 12040: current
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
16000 +3.14770410e+03 +3.14746170e+03 +2.42e-01 4.13e-03 4.47e-06 2.82s [L]
16000 +3.14659655e+03 +3.14746318e+03 -8.67e-01 1.94e-03 4.05e-06 2.82s [A]
Termination check: 4.129900e-03|1.036158e-03 4.469477e-06|2.070235e-01 3.849848e-05|1.000000e-04
Termination check: 1.944635e-03|1.036158e-03 4.050609e-06|2.070235e-01 1.376682e-04|1.000000e-04
Last restart was iter 12240: average
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
20000 +3.14823809e+03 +3.14746977e+03 +7.68e-01 7.36e-03 4.14e-06 3.50s [L]
20000 +3.14774868e+03 +3.14747355e+03 +2.75e-01 2.13e-03 2.26e-06 3.50s [A]
Termination check: 7.359316e-03|1.036158e-03 4.136498e-06|2.070235e-01 1.220189e-04|1.000000e-04
Termination check: 2.129193e-03|1.036158e-03 2.259238e-06|2.070235e-01 4.369881e-05|1.000000e-04
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
24000 +3.14813795e+03 +3.14748888e+03 +6.49e-01 5.32e-03 3.67e-06 4.17s [L]
24000 +3.14844821e+03 +3.14748980e+03 +9.58e-01 3.85e-03 1.14e-06 4.17s [A]
Termination check: 5.322694e-03|1.036158e-03 3.667634e-06|2.070235e-01 1.030817e-04|1.000000e-04
Termination check: 3.854022e-03|1.036158e-03 1.136330e-06|2.070235e-01 1.522019e-04|1.000000e-04
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
28000 +3.14744877e+03 +3.14749012e+03 -4.14e-02 2.09e-03 1.31e-06 4.82s [L]
28000 +3.14800194e+03 +3.14749394e+03 +5.08e-01 2.98e-03 5.80e-07 4.82s [A]
Termination check: 2.086090e-03|1.036158e-03 1.307150e-06|2.070235e-01 6.568466e-06|1.000000e-04
Termination check: 2.975282e-03|1.036158e-03 5.795401e-07|2.070235e-01 8.067989e-05|1.000000e-04
Last restart was iter 19160: current
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
30240 +3.14769267e+03 +3.14749172e+03 +2.01e-01 1.01e-03 1.14e-06 5.17s [L]
30240 +3.14773743e+03 +3.14749297e+03 +2.44e-01 1.10e-03 8.47e-07 5.17s [A]
Solving information: Optimal current solution.
Primal objective: +3.14769267e+03
Dual objective: +3.14749172e+03
Primal infeas (abs/rel): 1.01e-03 / 9.74e-06
Dual infeas (abs/rel): 1.14e-06 / 5.50e-10
Duality gap (abs/rel): 2.01e-01 / 3.19e-05
Number of iterations: 30240
Timing information:
Total solver time 5.227000e+00 in 30240 iterations
Solve time 5.172000e+00 in 30240 iterations
Iters per sec 5.846868e+03
Scaling time 1.400000e-02
Presolve time 4.100000e-02
Ax 5.310000e-01 in 31698 calls
Aty 7.340000e-01 in 31698 calls
ComputeResiduals 0.000000e+00 in 0 calls
UpdateIterates 3.976000e+00 in 30240 calls
GPU Timing information:
CudaPrepare 8.600000e-02
Alloc&CopyMatToDevice 3.000000e-03
CopyVecToDevice 0.000000e+00
DeviceMatVecProd 1.261000e+00
CopyVecToHost 0.000000e+00
--------------------------------
--- saving to ./solution-sum.json
--------------------------------
Free Device memory 1.000000e-03
D:\test\test_cudaLinear>cudalinear -fname instance20.mps -nIterLim 500000 -ifPre 1 -dPrimalTol 1e-5
num threads= 1
--------------------------------------------------
reading file...
instance20.mps
--------------------------------------------------
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
--------------------------------------------------
running presolve
--------------------------------------------------
Presolving model
1142 rows, 6249 cols, 426542 nonzeros 0s
1142 rows, 6249 cols, 426542 nonzeros 0s
Presolve status: Not reduced
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
Minimize
No obj offset
--------------------------------------------------
running scaling
- use Ruiz scaling
- use PC scaling
--------------------------------------------------
--------------------------------------------------
enter main solve loop
--------------------------------------------------
____ _ _ ____ ____ _ ____
/ ___| | | | _ \| _ \| | | _ \
| | | | | | |_) | | | | | | |_) |
| |___| |_| | __/| |_| | |___| __/
\____|\___/|_| |____/|_____|_|
Cuda runtime 12060
Cuda driver 12080
cuSparse 12504
Cuda device 0: NVIDIA GeForce RTX 4060 Ti
--------------------------------------------------
CUPDHG Parameters:
--------------------------------------------------
nIterLim: 500000
dTimeLim (sec): 3600.00
ifScaling: 1
ifRuizScaling: 1
ifL2Scaling: 0
ifPcScaling: 1
eLineSearchMethod: 2
dPrimalTol: 1.0000e-05
dDualTol: 1.0000e-04
dGapTol: 1.0000e-04
dFeasTol: 1.0000e-08
eRestartMethod: 1
--------------------------------------------------
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
0 +0.00000000e+00 +0.00000000e+00 +0.00e+00 1.59e+02 0.00e+00 0.01s [L]
0 +0.00000000e+00 +0.00000000e+00 +0.00e+00 1.59e+02 0.00e+00 0.01s [A]
Termination check: 1.587199e+02|1.597199e-03 0.000000e+00|3.319796e-01 0.000000e+00|1.000000e-04
Termination check: 1.587199e+02|1.597199e-03 0.000000e+00|3.319796e-01 0.000000e+00|1.000000e-04
Last restart was iter 0: average
Last restart was iter 1: average
Last restart was iter 2: average
Last restart was iter 4: current
Last restart was iter 7: current
Last restart was iter 9: average
Last restart was iter 40: average
Last restart was iter 80: average
Last restart was iter 160: average
Last restart was iter 280: current
Last restart was iter 360: current
Last restart was iter 600: current
Last restart was iter 920: current
Last restart was iter 1000: current
Last restart was iter 1160: average
Last restart was iter 1240: average
Last restart was iter 1760: current
Last restart was iter 2200: current
Last restart was iter 2760: average
Last restart was iter 3120: current
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
4000 +4.79820895e+03 +4.75850224e+03 +3.97e+01 1.21e-01 4.64e-03 0.77s [L]
4000 +4.81498430e+03 +4.75654113e+03 +5.84e+01 6.00e-02 2.47e-03 0.77s [A]
Termination check: 1.207468e-01|1.597199e-03 4.636336e-03|3.319796e-01 4.154416e-03|1.000000e-04
Termination check: 6.000623e-02|1.597199e-03 2.468643e-03|3.319796e-01 6.105302e-03|1.000000e-04
Last restart was iter 3320: current
Last restart was iter 4280: current
Last restart was iter 4320: current
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
7000 +4.76875256e+03 +4.76890483e+03 -1.52e-01 7.64e-03 5.89e-05 1.27s [L]
7000 +4.76922200e+03 +4.76896552e+03 +2.56e-01 9.15e-04 1.47e-05 1.27s [A]
Solving information: Optimal average solution.
Primal objective: +4.76922200e+03
Dual objective: +4.76896552e+03
Primal infeas (abs/rel): 9.15e-04 / 5.73e-06
Dual infeas (abs/rel): 1.47e-05 / 4.44e-09
Duality gap (abs/rel): 2.56e-01 / 2.69e-05
Number of iterations: 7000
Timing information:
Total solver time 1.367000e+00 in 7000 iterations
Solve time 1.275000e+00 in 7000 iterations
Iters per sec 5.490196e+03
Scaling time 1.800000e-02
Presolve time 7.400000e-02
Ax 1.210000e-01 in 7214 calls
Aty 1.380000e-01 in 7214 calls
ComputeResiduals 0.000000e+00 in 0 calls
UpdateIterates 9.700000e-01 in 7000 calls
GPU Timing information:
CudaPrepare 7.600000e-02
Alloc&CopyMatToDevice 5.000000e-03
CopyVecToDevice 0.000000e+00
DeviceMatVecProd 2.570000e-01
CopyVecToHost 0.000000e+00
--------------------------------
--- saving to ./solution-sum.json
--------------------------------
Free Device memory 0.000000e+00
D:\test\test_cudaLinear>cudalinear -fname instance21.mps -nIterLim 500000 -ifPre 1 -dPrimalTol 1e-5
num threads= 1
--------------------------------------------------
reading file...
instance21.mps
--------------------------------------------------
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
--------------------------------------------------
running presolve
--------------------------------------------------
Presolving model
1556 rows, 9235 cols, 603232 nonzeros 0s
1556 rows, 9235 cols, 603232 nonzeros 0s
Presolve status: Not reduced
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
Minimize
No obj offset
--------------------------------------------------
running scaling
- use Ruiz scaling
- use PC scaling
--------------------------------------------------
--------------------------------------------------
enter main solve loop
--------------------------------------------------
____ _ _ ____ ____ _ ____
/ ___| | | | _ \| _ \| | | _ \
| | | | | | |_) | | | | | | |_) |
| |___| |_| | __/| |_| | |___| __/
\____|\___/|_| |____/|_____|_|
Cuda runtime 12060
Cuda driver 12080
cuSparse 12504
Cuda device 0: NVIDIA GeForce RTX 4060 Ti
--------------------------------------------------
CUPDHG Parameters:
--------------------------------------------------
nIterLim: 500000
dTimeLim (sec): 3600.00
ifScaling: 1
ifRuizScaling: 1
ifL2Scaling: 0
ifPcScaling: 1
eLineSearchMethod: 2
dPrimalTol: 1.0000e-05
dDualTol: 1.0000e-04
dGapTol: 1.0000e-04
dFeasTol: 1.0000e-08
eRestartMethod: 1
--------------------------------------------------
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
0 +0.00000000e+00 +0.00000000e+00 +0.00e+00 2.36e+02 0.00e+00 0.00s [L]
0 +0.00000000e+00 +0.00000000e+00 +0.00e+00 2.36e+02 0.00e+00 0.00s [A]
Termination check: 2.360339e+02|2.370339e-03 0.000000e+00|3.955477e-01 0.000000e+00|1.000000e-04
Termination check: 2.360339e+02|2.370339e-03 0.000000e+00|3.955477e-01 0.000000e+00|1.000000e-04
Last restart was iter 0: average
Last restart was iter 1: average
Last restart was iter 2: average
Last restart was iter 4: current
Last restart was iter 7: current
Last restart was iter 8: current
Last restart was iter 9: average
Last restart was iter 40: average
Last restart was iter 80: average
Last restart was iter 160: current
Last restart was iter 280: average
Last restart was iter 360: current
Last restart was iter 440: current
Last restart was iter 600: current
Last restart was iter 920: current
Last restart was iter 1120: current
Last restart was iter 1360: current
Last restart was iter 1520: average
Last restart was iter 1600: current
Last restart was iter 2520: current
Last restart was iter 2640: current
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
4000 +2.11357849e+04 +2.11302185e+04 +5.57e+00 2.96e-02 2.64e-04 0.83s [L]
4000 +2.11404979e+04 +2.11299275e+04 +1.06e+01 5.08e-03 2.09e-04 0.83s [A]
Termination check: 2.963712e-02|2.370339e-03 2.640344e-04|3.955477e-01 1.316966e-04|1.000000e-04
Termination check: 5.084463e-03|2.370339e-03 2.088359e-04|3.955477e-01 2.500615e-04|1.000000e-04
Iter Primal.Obj Dual.Obj Gap Primal.Inf Dual.Inf Time
4440 +2.11156665e+04 +2.11305657e+04 -1.49e+01 2.21e-02 1.84e-04 0.91s [L]
4440 +2.11333909e+04 +2.11305333e+04 +2.86e+00 2.30e-03 1.61e-04 0.91s [A]
Solving information: Optimal average solution.
Primal objective: +2.11333909e+04
Dual objective: +2.11305333e+04
Primal infeas (abs/rel): 2.30e-03 / 9.70e-06
Dual infeas (abs/rel): 1.61e-04 / 4.06e-08
Duality gap (abs/rel): 2.86e+00 / 6.76e-05
Number of iterations: 4440
Timing information:
Total solver time 1.056000e+00 in 4440 iterations
Solve time 9.140000e-01 in 4440 iterations
Iters per sec 4.857768e+03
Scaling time 2.600000e-02
Presolve time 1.160000e-01
Ax 7.600000e-02 in 4668 calls
Aty 7.900000e-02 in 4668 calls
ComputeResiduals 0.000000e+00 in 0 calls
UpdateIterates 6.920000e-01 in 4440 calls
GPU Timing information:
CudaPrepare 8.600000e-02
Alloc&CopyMatToDevice 8.000000e-03
CopyVecToDevice 0.000000e+00
DeviceMatVecProd 1.550000e-01
CopyVecToHost 0.000000e+00
--------------------------------
--- saving to ./solution-sum.json
--------------------------------
Free Device memory 0.000000e+00
D:\test\test_cudaLinear>
原因として考えられるのは、CPU-GPU間の転送部です。GPUのプログラムは、経験がないので良くわからないのですが、cuPDLPの本体記述によるものと、考えています。現在の実装の中規模以下では、頻繁にCPUと通信することがボトルネックになっていると推察されます。(Google版(CPU版)も見たのですが、cpPDLP(COPT版)の方に歩があると見ています。)
恐らく、超大規模問題では、GPU演算の時間が相対的に主体なので、CPU-GPU転送時間がボトルナックにはならないのではないでしょうか?現在のPDLPのトピックは、超大規模問題であり、Nvidiaもプロモートしています。
NVIDIA cuOpt で大規模な線形計画問題を加速する - NVIDIA 技術ブログ
ちなみに配送最適化問題については、こちらが詳しい
運搬経路問題(配送最適化問題,Vehicle Routing Problem) をPuLPで解く #Python - Qiita
しかし、我々の主な関心は、中大規模問題であり、主に商用のISMソルバの置き換えにあります。商用ISMソルバを使いたくても使えない庶民向けのソルバです。
よって、GPUの力を最大限発揮させるには、cuPDLP本体を記述し直しなおすしかない、という結論になります。
もう一つの問題は、WarmStartのサポートです。Simplexでは、WarmStartの恩恵があるのですが、FirstOrderにおいても、これは可能な筈です。これも現在のcuPDLPは、サポートされていないので、実装を検討する必要があります。
以上2点の実装を行う必要があります。
Highs ISMも組み込んでみたのですが、現在のスケジュールナースの速度の倍程度遅く、仮に将来マルチスレッド化されても、期待の改善度を上回ることはない、と判断しました。一方、FirstOrderは、高精度は、期待できないものの、WarmStartが魅力であり、GPUのスケーラビリティを生かせる可能性もあり、将来性があります。
以上より、Unresolved instances,INRC2 8weeks 2instancesとScheduling Benchmarks 2instances を解く為には、cuPDLPの実装し直しが必要であると結論しました。
Highs Teamが以上の要件を満足するように再実装してくれることを期待したいのですが、待っていられないし、COPTが率先して実装することもあり得ないと思うので、自分で行うことにしました。