2025年2月8日土曜日

GPUによるPDLPの評価

GPU P1000とRTX4060TI 16GBを比較してみました。

P1000 (640CUDA  1.3GHz)→4060TI(4352CUDA 2.5GHz)

にすれば、GPU性能上、

4352/640*2.5/1.3 ≒13

13倍程度の高速化が期待されたのですが、現実は2倍以下の場合もあります。このままでは、使いものになりません。仮にもっとCUDA数の大きなGPUを使ったとしても、現在の伸びでは期待できません。

cPDLPCPU  cuPDLP(Quadro P1000)  CLP  cuPDLP(RTX4060TI16GB)

n080w8_2_0-4-0-9-1-9-6-2  47.7sec   14.3sec 76sec        9sec
instance19 46.7sec    21.6sec 2.6sec               5sec
instance20 14.8sec    6.7sec 9.6sec               1sec
instance21 41.5sec   16.2sec 78sec                1sec

傾向としては、
■Iteration数がバラつく。試行毎でもバラつきます。
■規模が大きくなるにつれて、改善傾向(高速化度)が強まる
■大規模問題(instance21)での結果は、期待通り

GPUクロックが固定されていないためか?と一瞬思ったのですが、固定してやっても結果は変わりませんでした。

NVIDIA のGPUのクロックを固定する方法 - pyopyopyo - Linuxとかプログラミングの覚え書き -








D:\test\test_cudaLinear>test

D:\test\test_cudaLinear>nvidia-smi -q -d CLOCK

==============NVSMI LOG==============

Timestamp                                 : Sat Feb  8 04:26:08 2025
Driver Version                            : 571.96
CUDA Version                              : 12.8

Attached GPUs                             : 1
GPU 00000000:01:00.0
    Clocks
        Graphics                          : 2805 MHz
        SM                                : 2805 MHz
        Memory                            : 9001 MHz
        Video                             : 2190 MHz
    Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Default Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 3105 MHz
        SM                                : 3105 MHz
        Memory                            : 9001 MHz
        Video                             : 2415 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    SM Clock Samples
        Duration                          : Not Found
        Number of Samples                 : Not Found
        Max                               : Not Found
        Min                               : Not Found
        Avg                               : Not Found
    Memory Clock Samples
        Duration                          : Not Found
        Number of Samples                 : Not Found
        Max                               : Not Found
        Min                               : Not Found
        Avg                               : Not Found
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A


D:\test\test_cudaLinear>cudalinear -fname n080w8_2_0-4-0-9-1-9-6-2BG2.mps -nIterLim 500000 -ifPre 1 -dPrimalTol 1e-5
num threads= 1
--------------------------------------------------
reading file...
        n080w8_2_0-4-0-9-1-9-6-2BG2.mps
--------------------------------------------------
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
--------------------------------------------------
running presolve
--------------------------------------------------
Presolving model
11734 rows, 20786 cols, 215656 nonzeros  0s
11734 rows, 20693 cols, 215563 nonzeros  0s
Presolve status: Reduced
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
Minimize
No obj offset
--------------------------------------------------
running scaling
- use Ruiz scaling
- use PC scaling
--------------------------------------------------
--------------------------------------------------
enter main solve loop
--------------------------------------------------

  ____ _   _ ____  ____  _     ____
 / ___| | | |  _ \|  _ \| |   |  _ \
| |   | | | | |_) | | | | |   | |_) |
| |___| |_| |  __/| |_| | |___|  __/
 \____|\___/|_|   |____/|_____|_|

Cuda runtime 12060
Cuda driver 12080
cuSparse 12504
Cuda device 0: NVIDIA GeForce RTX 4060 Ti


--------------------------------------------------
CUPDHG Parameters:
--------------------------------------------------

    nIterLim:          500000
    dTimeLim (sec):    3600.00
    ifScaling:         1
    ifRuizScaling:     1
    ifL2Scaling:       0
    ifPcScaling:       1
    eLineSearchMethod: 2
    dPrimalTol:        1.0000e-05
    dDualTol:          1.0000e-04
    dGapTol:           1.0000e-04
    dFeasTol:          1.0000e-08
    eRestartMethod:    1

--------------------------------------------------

     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
        0  +0.00000000e+00  +0.00000000e+00  +0.00e+00    1.03e+02  0.00e+00   0.01s [L]
        0  +0.00000000e+00  +0.00000000e+00  +0.00e+00    1.03e+02  0.00e+00   0.01s [A]
Termination check: 1.032666e+02|1.042666e-03  0.000000e+00|2.920578e+00  0.000000e+00|1.000000e-04
Termination check: 1.032666e+02|1.042666e-03  0.000000e+00|2.920578e+00  0.000000e+00|1.000000e-04
Last restart was iter 0: average
Last restart was iter 1: average
Last restart was iter 2: average
Last restart was iter 4: average
Last restart was iter 7: current
Last restart was iter 40: average
Last restart was iter 80: average
Last restart was iter 160: average
Last restart was iter 280: average
Last restart was iter 400: current
Last restart was iter 480: average
Last restart was iter 760: average
Last restart was iter 1120: average
Last restart was iter 1600: average
Last restart was iter 1880: current
Last restart was iter 2160: average
Last restart was iter 2600: average
Last restart was iter 2880: average
Last restart was iter 3160: average
Last restart was iter 3280: current
Last restart was iter 3640: current
Last restart was iter 3880: average
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
     4000  +3.38816992e+03  +3.38118002e+03  +6.99e+00    3.30e-01  1.65e-02   0.73s [L]
     4000  +3.38779443e+03  +3.38934372e+03  -1.55e+00    9.24e-02  7.53e-04   0.73s [A]
Termination check: 3.298630e-01|1.042666e-03  1.650873e-02|2.920578e+00  1.032428e-03|1.000000e-04
Termination check: 9.244807e-02|1.042666e-03  7.527463e-04|2.920578e+00  2.285719e-04|1.000000e-04
Last restart was iter 3960: current
Last restart was iter 6200: current
Last restart was iter 6440: current
Last restart was iter 6560: average
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
     8000  +3.41615033e+03  +3.41508172e+03  +1.07e+00    1.75e-02  1.02e-04   1.41s [L]
     8000  +3.41329596e+03  +3.41716699e+03  -3.87e+00    1.18e-02  9.91e-05   1.41s [A]
Termination check: 1.747635e-02|1.042666e-03  1.018131e-04|2.920578e+00  1.564065e-04|1.000000e-04
Termination check: 1.178435e-02|1.042666e-03  9.914102e-05|2.920578e+00  5.666482e-04|1.000000e-04
Last restart was iter 6800: average
Last restart was iter 10640: average
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    12000  +3.42440093e+03  +3.42183563e+03  +2.57e+00    6.01e-03  7.29e-05   2.09s [L]
    12000  +3.42179821e+03  +3.42244885e+03  -6.51e-01    4.98e-03  3.66e-05   2.09s [A]
Termination check: 6.008872e-03|1.042666e-03  7.291813e-05|2.920578e+00  3.746476e-04|1.000000e-04
Termination check: 4.978544e-03|1.042666e-03  3.658292e-05|2.920578e+00  9.504995e-05|1.000000e-04
Last restart was iter 10840: average
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    16000  +3.42948919e+03  +3.42399978e+03  +5.49e+00    2.48e-03  5.15e-05   2.75s [L]
    16000  +3.42739936e+03  +3.42399246e+03  +3.41e+00    1.35e-03  3.87e-05   2.75s [A]
Termination check: 2.482076e-03|1.042666e-03  5.146503e-05|2.920578e+00  8.008492e-04|1.000000e-04
Termination check: 1.350319e-03|1.042666e-03  3.872877e-05|2.920578e+00  4.971842e-04|1.000000e-04
Last restart was iter 12520: current
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    20000  +3.42816896e+03  +3.42516766e+03  +3.00e+00    2.66e-03  3.41e-05   3.43s [L]
    20000  +3.42828181e+03  +3.42525346e+03  +3.03e+00    1.99e-03  2.41e-05   3.43s [A]
Termination check: 2.655098e-03|1.042666e-03  3.410679e-05|2.920578e+00  4.378692e-04|1.000000e-04
Termination check: 1.987844e-03|1.042666e-03  2.405062e-05|2.920578e+00  4.418021e-04|1.000000e-04
Last restart was iter 19600: current
Last restart was iter 21160: current
Last restart was iter 23240: current
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    24000  +3.42643584e+03  +3.42605686e+03  +3.79e-01    4.21e-02  2.07e-05   4.10s [L]
    24000  +3.42637363e+03  +3.42612232e+03  +2.51e-01    5.40e-03  4.17e-06   4.10s [A]
Termination check: 4.210707e-02|1.042666e-03  2.069015e-05|2.920578e+00  5.529809e-05|1.000000e-04
Termination check: 5.401254e-03|1.042666e-03  4.166442e-06|2.920578e+00  3.666811e-05|1.000000e-04
Last restart was iter 23720: average
Last restart was iter 24240: current
Last restart was iter 24440: average
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    28000  +3.42565192e+03  +3.42648533e+03  -8.33e-01    8.06e-03  1.51e-06   4.79s [L]
    28000  +3.42622676e+03  +3.42650881e+03  -2.82e-01    7.13e-03  1.47e-06   4.79s [A]
Termination check: 8.056111e-03|1.042666e-03  1.512344e-06|2.920578e+00  1.216096e-04|1.000000e-04
Termination check: 7.130953e-03|1.042666e-03  1.469720e-06|2.920578e+00  4.115273e-05|1.000000e-04
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    32000  +3.42568942e+03  +3.42657068e+03  -8.81e-01    7.60e-03  2.20e-06   5.46s [L]
    32000  +3.42592677e+03  +3.42661118e+03  -6.84e-01    6.52e-03  1.21e-06   5.46s [A]
Termination check: 7.600765e-03|1.042666e-03  2.195129e-06|2.920578e+00  1.285898e-04|1.000000e-04
Termination check: 6.523763e-03|1.042666e-03  1.213275e-06|2.920578e+00  9.986266e-05|1.000000e-04
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    36000  +3.42595291e+03  +3.42657121e+03  -6.18e-01    4.57e-03  3.24e-06   6.13s [L]
    36000  +3.42538203e+03  +3.42667066e+03  -1.29e+00    5.51e-03  6.04e-07   6.13s [A]
Termination check: 4.571100e-03|1.042666e-03  3.235627e-06|2.920578e+00  9.021589e-05|1.000000e-04
Termination check: 5.511004e-03|1.042666e-03  6.042666e-07|2.920578e+00  1.880368e-04|1.000000e-04
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    40000  +3.42722460e+03  +3.42661804e+03  +6.07e-01    2.70e-03  1.28e-06   6.80s [L]
    40000  +3.42583599e+03  +3.42669722e+03  -8.61e-01    3.91e-03  4.80e-07   6.80s [A]
Termination check: 2.699920e-03|1.042666e-03  1.278393e-06|2.920578e+00  8.848638e-05|1.000000e-04
Termination check: 3.905445e-03|1.042666e-03  4.796320e-07|2.920578e+00  1.256628e-04|1.000000e-04
Last restart was iter 26240: average
Last restart was iter 41000: current
Last restart was iter 41080: current
Last restart was iter 41280: current
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    44000  +3.42547714e+03  +3.42643168e+03  -9.55e-01    3.96e-03  6.86e-06   7.48s [L]
    44000  +3.42564450e+03  +3.42657412e+03  -9.30e-01    4.58e-03  3.11e-06   7.48s [A]
Termination check: 3.963274e-03|1.042666e-03  6.856812e-06|2.920578e+00  1.392908e-04|1.000000e-04
Termination check: 4.578061e-03|1.042666e-03  3.113830e-06|2.920578e+00  1.356470e-04|1.000000e-04
Last restart was iter 42760: current
Last restart was iter 44960: current
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    48000  +3.42722966e+03  +3.42653605e+03  +6.94e-01    1.96e-03  2.07e-06   8.16s [L]
    48000  +3.42690306e+03  +3.42656473e+03  +3.38e-01    1.63e-03  1.62e-06   8.16s [A]
Termination check: 1.956777e-03|1.042666e-03  2.065282e-06|2.920578e+00  1.011859e-04|1.000000e-04
Termination check: 1.627761e-03|1.042666e-03  1.617831e-06|2.920578e+00  4.935833e-05|1.000000e-04
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    52000  +3.42717924e+03  +3.42656103e+03  +6.18e-01    1.57e-03  2.70e-06   8.83s [L]
    52000  +3.42732935e+03  +3.42665984e+03  +6.70e-01    1.07e-03  3.31e-07   8.83s [A]
Termination check: 1.567878e-03|1.042666e-03  2.698537e-06|2.920578e+00  9.018737e-05|1.000000e-04
Termination check: 1.069825e-03|1.042666e-03  3.313637e-07|2.920578e+00  9.766843e-05|1.000000e-04
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    52400  +3.42691858e+03  +3.42656166e+03  +3.57e-01    1.59e-03  2.21e-06   8.92s [L]
    52400  +3.42731388e+03  +3.42666552e+03  +6.48e-01    1.04e-03  3.29e-07   8.92s [A]

Solving information:        Optimal average solution.
          Primal objective: +3.42731388e+03
            Dual objective: +3.42666552e+03
   Primal infeas (abs/rel): 1.04e-03 / 1.00e-05
     Dual infeas (abs/rel): 3.29e-07 / 1.13e-11
     Duality gap (abs/rel): 6.48e-01 / 9.46e-05
      Number of iterations: 52400

Timing information:
    Total solver time 9.013000e+00 in 52400 iterations
           Solve time 8.916000e+00 in 52400 iterations
        Iters per sec 5.877075e+03
         Scaling time 1.200000e-02
        Presolve time 8.500000e-02
                   Ax 1.045000e+00 in 53734 calls
                  Aty 1.156000e+00 in 53734 calls
     ComputeResiduals 0.000000e+00 in 0 calls
       UpdateIterates 6.884000e+00 in 52400 calls

GPU Timing information:
          CudaPrepare 1.040000e-01
Alloc&CopyMatToDevice 4.000000e-03
      CopyVecToDevice 0.000000e+00
     DeviceMatVecProd 2.190000e+00
        CopyVecToHost 0.000000e+00
--------------------------------
--- saving to ./solution-sum.json
--------------------------------
  Free Device memory 1.000000e-03

D:\test\test_cudaLinear>cudalinear -fname instance19.mps -nIterLim 500000 -ifPre 1 -dPrimalTol 1e-5
num threads= 1
--------------------------------------------------
reading file...
        instance19.mps
--------------------------------------------------
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
--------------------------------------------------
running presolve
--------------------------------------------------
Presolving model
459 rows, 6083 cols, 254040 nonzeros  0s
459 rows, 6083 cols, 254040 nonzeros  0s
Presolve status: Reduced
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
Minimize
Has obj offset 300.000000
--------------------------------------------------
running scaling
- use Ruiz scaling
- use PC scaling
--------------------------------------------------
--------------------------------------------------
enter main solve loop
--------------------------------------------------

  ____ _   _ ____  ____  _     ____
 / ___| | | |  _ \|  _ \| |   |  _ \
| |   | | | | |_) | | | | |   | |_) |
| |___| |_| |  __/| |_| | |___|  __/
 \____|\___/|_|   |____/|_____|_|

Cuda runtime 12060
Cuda driver 12080
cuSparse 12504
Cuda device 0: NVIDIA GeForce RTX 4060 Ti


--------------------------------------------------
CUPDHG Parameters:
--------------------------------------------------

    nIterLim:          500000
    dTimeLim (sec):    3600.00
    ifScaling:         1
    ifRuizScaling:     1
    ifL2Scaling:       0
    ifPcScaling:       1
    eLineSearchMethod: 2
    dPrimalTol:        1.0000e-05
    dDualTol:          1.0000e-04
    dGapTol:           1.0000e-04
    dFeasTol:          1.0000e-08
    eRestartMethod:    1

--------------------------------------------------

     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
        0  +3.00000000e+02  +1.20000000e+01  +2.88e+02    1.03e+02  0.00e+00   0.00s [L]
        0  +3.00000000e+02  +1.20000000e+01  +2.88e+02    1.03e+02  0.00e+00   0.00s [A]
Termination check: 1.026158e+02|1.036158e-03  0.000000e+00|2.070235e-01  9.201278e-01|1.000000e-04
Termination check: 1.026158e+02|1.036158e-03  0.000000e+00|2.070235e-01  9.201278e-01|1.000000e-04
Last restart was iter 0: average
Last restart was iter 1: average
Last restart was iter 2: average
Last restart was iter 4: current
Last restart was iter 7: current
Last restart was iter 9: average
Last restart was iter 40: average
Last restart was iter 80: average
Last restart was iter 160: average
Last restart was iter 280: current
Last restart was iter 440: average
Last restart was iter 640: current
Last restart was iter 720: current
Last restart was iter 1040: current
Last restart was iter 1200: current
Last restart was iter 1720: average
Last restart was iter 1800: average
Last restart was iter 2120: current
Last restart was iter 2560: average
Last restart was iter 3000: current
Last restart was iter 3120: average
Last restart was iter 3240: current
Last restart was iter 3320: current
Last restart was iter 3360: current
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
     4000  +3.13156582e+03  +3.14485627e+03  -1.33e+01    1.92e-01  1.00e-03   0.79s [L]
     4000  +3.13510013e+03  +3.14558144e+03  -1.05e+01    1.75e-01  2.23e-04   0.79s [A]
Termination check: 1.918230e-01|1.036158e-03  1.002731e-03|2.070235e-01  2.117183e-03|1.000000e-04
Termination check: 1.753267e-01|1.036158e-03  2.229571e-04|2.070235e-01  1.668550e-03|1.000000e-04
Last restart was iter 3440: average
Last restart was iter 4480: current
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
     8000  +3.15424115e+03  +3.14737552e+03  +6.87e+00    1.92e-02  1.30e-04   1.51s [L]
     8000  +3.15309039e+03  +3.14731850e+03  +5.77e+00    1.60e-02  3.07e-05   1.51s [A]
Termination check: 1.924401e-02|1.036158e-03  1.297241e-04|2.070235e-01  1.089329e-03|1.000000e-04
Termination check: 1.595192e-02|1.036158e-03  3.073819e-05|2.070235e-01  9.159682e-04|1.000000e-04
Last restart was iter 7000: current
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    12000  +3.14754575e+03  +3.14742372e+03  +1.22e-01    6.13e-03  1.18e-05   2.19s [L]
    12000  +3.14850105e+03  +3.14743339e+03  +1.07e+00    2.89e-03  7.55e-06   2.19s [A]
Termination check: 6.131199e-03|1.036158e-03  1.181000e-05|2.070235e-01  1.938108e-05|1.000000e-04
Termination check: 2.888755e-03|1.036158e-03  7.552550e-06|2.070235e-01  1.695513e-04|1.000000e-04
Last restart was iter 8840: current
Last restart was iter 12040: current
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    16000  +3.14770410e+03  +3.14746170e+03  +2.42e-01    4.13e-03  4.47e-06   2.82s [L]
    16000  +3.14659655e+03  +3.14746318e+03  -8.67e-01    1.94e-03  4.05e-06   2.82s [A]
Termination check: 4.129900e-03|1.036158e-03  4.469477e-06|2.070235e-01  3.849848e-05|1.000000e-04
Termination check: 1.944635e-03|1.036158e-03  4.050609e-06|2.070235e-01  1.376682e-04|1.000000e-04
Last restart was iter 12240: average
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    20000  +3.14823809e+03  +3.14746977e+03  +7.68e-01    7.36e-03  4.14e-06   3.50s [L]
    20000  +3.14774868e+03  +3.14747355e+03  +2.75e-01    2.13e-03  2.26e-06   3.50s [A]
Termination check: 7.359316e-03|1.036158e-03  4.136498e-06|2.070235e-01  1.220189e-04|1.000000e-04
Termination check: 2.129193e-03|1.036158e-03  2.259238e-06|2.070235e-01  4.369881e-05|1.000000e-04
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    24000  +3.14813795e+03  +3.14748888e+03  +6.49e-01    5.32e-03  3.67e-06   4.17s [L]
    24000  +3.14844821e+03  +3.14748980e+03  +9.58e-01    3.85e-03  1.14e-06   4.17s [A]
Termination check: 5.322694e-03|1.036158e-03  3.667634e-06|2.070235e-01  1.030817e-04|1.000000e-04
Termination check: 3.854022e-03|1.036158e-03  1.136330e-06|2.070235e-01  1.522019e-04|1.000000e-04
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    28000  +3.14744877e+03  +3.14749012e+03  -4.14e-02    2.09e-03  1.31e-06   4.82s [L]
    28000  +3.14800194e+03  +3.14749394e+03  +5.08e-01    2.98e-03  5.80e-07   4.82s [A]
Termination check: 2.086090e-03|1.036158e-03  1.307150e-06|2.070235e-01  6.568466e-06|1.000000e-04
Termination check: 2.975282e-03|1.036158e-03  5.795401e-07|2.070235e-01  8.067989e-05|1.000000e-04
Last restart was iter 19160: current
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
    30240  +3.14769267e+03  +3.14749172e+03  +2.01e-01    1.01e-03  1.14e-06   5.17s [L]
    30240  +3.14773743e+03  +3.14749297e+03  +2.44e-01    1.10e-03  8.47e-07   5.17s [A]

Solving information:        Optimal current solution.
          Primal objective: +3.14769267e+03
            Dual objective: +3.14749172e+03
   Primal infeas (abs/rel): 1.01e-03 / 9.74e-06
     Dual infeas (abs/rel): 1.14e-06 / 5.50e-10
     Duality gap (abs/rel): 2.01e-01 / 3.19e-05
      Number of iterations: 30240

Timing information:
    Total solver time 5.227000e+00 in 30240 iterations
           Solve time 5.172000e+00 in 30240 iterations
        Iters per sec 5.846868e+03
         Scaling time 1.400000e-02
        Presolve time 4.100000e-02
                   Ax 5.310000e-01 in 31698 calls
                  Aty 7.340000e-01 in 31698 calls
     ComputeResiduals 0.000000e+00 in 0 calls
       UpdateIterates 3.976000e+00 in 30240 calls

GPU Timing information:
          CudaPrepare 8.600000e-02
Alloc&CopyMatToDevice 3.000000e-03
      CopyVecToDevice 0.000000e+00
     DeviceMatVecProd 1.261000e+00
        CopyVecToHost 0.000000e+00
--------------------------------
--- saving to ./solution-sum.json
--------------------------------
  Free Device memory 1.000000e-03

D:\test\test_cudaLinear>cudalinear -fname instance20.mps -nIterLim 500000 -ifPre 1 -dPrimalTol 1e-5
num threads= 1
--------------------------------------------------
reading file...
        instance20.mps
--------------------------------------------------
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
--------------------------------------------------
running presolve
--------------------------------------------------
Presolving model
1142 rows, 6249 cols, 426542 nonzeros  0s
1142 rows, 6249 cols, 426542 nonzeros  0s
Presolve status: Not reduced
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
Minimize
No obj offset
--------------------------------------------------
running scaling
- use Ruiz scaling
- use PC scaling
--------------------------------------------------
--------------------------------------------------
enter main solve loop
--------------------------------------------------

  ____ _   _ ____  ____  _     ____
 / ___| | | |  _ \|  _ \| |   |  _ \
| |   | | | | |_) | | | | |   | |_) |
| |___| |_| |  __/| |_| | |___|  __/
 \____|\___/|_|   |____/|_____|_|

Cuda runtime 12060
Cuda driver 12080
cuSparse 12504
Cuda device 0: NVIDIA GeForce RTX 4060 Ti


--------------------------------------------------
CUPDHG Parameters:
--------------------------------------------------

    nIterLim:          500000
    dTimeLim (sec):    3600.00
    ifScaling:         1
    ifRuizScaling:     1
    ifL2Scaling:       0
    ifPcScaling:       1
    eLineSearchMethod: 2
    dPrimalTol:        1.0000e-05
    dDualTol:          1.0000e-04
    dGapTol:           1.0000e-04
    dFeasTol:          1.0000e-08
    eRestartMethod:    1

--------------------------------------------------

     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
        0  +0.00000000e+00  +0.00000000e+00  +0.00e+00    1.59e+02  0.00e+00   0.01s [L]
        0  +0.00000000e+00  +0.00000000e+00  +0.00e+00    1.59e+02  0.00e+00   0.01s [A]
Termination check: 1.587199e+02|1.597199e-03  0.000000e+00|3.319796e-01  0.000000e+00|1.000000e-04
Termination check: 1.587199e+02|1.597199e-03  0.000000e+00|3.319796e-01  0.000000e+00|1.000000e-04
Last restart was iter 0: average
Last restart was iter 1: average
Last restart was iter 2: average
Last restart was iter 4: current
Last restart was iter 7: current
Last restart was iter 9: average
Last restart was iter 40: average
Last restart was iter 80: average
Last restart was iter 160: average
Last restart was iter 280: current
Last restart was iter 360: current
Last restart was iter 600: current
Last restart was iter 920: current
Last restart was iter 1000: current
Last restart was iter 1160: average
Last restart was iter 1240: average
Last restart was iter 1760: current
Last restart was iter 2200: current
Last restart was iter 2760: average
Last restart was iter 3120: current
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
     4000  +4.79820895e+03  +4.75850224e+03  +3.97e+01    1.21e-01  4.64e-03   0.77s [L]
     4000  +4.81498430e+03  +4.75654113e+03  +5.84e+01    6.00e-02  2.47e-03   0.77s [A]
Termination check: 1.207468e-01|1.597199e-03  4.636336e-03|3.319796e-01  4.154416e-03|1.000000e-04
Termination check: 6.000623e-02|1.597199e-03  2.468643e-03|3.319796e-01  6.105302e-03|1.000000e-04
Last restart was iter 3320: current
Last restart was iter 4280: current
Last restart was iter 4320: current
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
     7000  +4.76875256e+03  +4.76890483e+03  -1.52e-01    7.64e-03  5.89e-05   1.27s [L]
     7000  +4.76922200e+03  +4.76896552e+03  +2.56e-01    9.15e-04  1.47e-05   1.27s [A]

Solving information:        Optimal average solution.
          Primal objective: +4.76922200e+03
            Dual objective: +4.76896552e+03
   Primal infeas (abs/rel): 9.15e-04 / 5.73e-06
     Dual infeas (abs/rel): 1.47e-05 / 4.44e-09
     Duality gap (abs/rel): 2.56e-01 / 2.69e-05
      Number of iterations: 7000

Timing information:
    Total solver time 1.367000e+00 in 7000 iterations
           Solve time 1.275000e+00 in 7000 iterations
        Iters per sec 5.490196e+03
         Scaling time 1.800000e-02
        Presolve time 7.400000e-02
                   Ax 1.210000e-01 in 7214 calls
                  Aty 1.380000e-01 in 7214 calls
     ComputeResiduals 0.000000e+00 in 0 calls
       UpdateIterates 9.700000e-01 in 7000 calls

GPU Timing information:
          CudaPrepare 7.600000e-02
Alloc&CopyMatToDevice 5.000000e-03
      CopyVecToDevice 0.000000e+00
     DeviceMatVecProd 2.570000e-01
        CopyVecToHost 0.000000e+00
--------------------------------
--- saving to ./solution-sum.json
--------------------------------
  Free Device memory 0.000000e+00

D:\test\test_cudaLinear>cudalinear -fname instance21.mps -nIterLim 500000 -ifPre 1 -dPrimalTol 1e-5
num threads= 1
--------------------------------------------------
reading file...
        instance21.mps
--------------------------------------------------
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
--------------------------------------------------
running presolve
--------------------------------------------------
Presolving model
1556 rows, 9235 cols, 603232 nonzeros  0s
1556 rows, 9235 cols, 603232 nonzeros  0s
Presolve status: Not reduced
Running HiGHS 1.9.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
Minimize
No obj offset
--------------------------------------------------
running scaling
- use Ruiz scaling
- use PC scaling
--------------------------------------------------
--------------------------------------------------
enter main solve loop
--------------------------------------------------

  ____ _   _ ____  ____  _     ____
 / ___| | | |  _ \|  _ \| |   |  _ \
| |   | | | | |_) | | | | |   | |_) |
| |___| |_| |  __/| |_| | |___|  __/
 \____|\___/|_|   |____/|_____|_|

Cuda runtime 12060
Cuda driver 12080
cuSparse 12504
Cuda device 0: NVIDIA GeForce RTX 4060 Ti


--------------------------------------------------
CUPDHG Parameters:
--------------------------------------------------

    nIterLim:          500000
    dTimeLim (sec):    3600.00
    ifScaling:         1
    ifRuizScaling:     1
    ifL2Scaling:       0
    ifPcScaling:       1
    eLineSearchMethod: 2
    dPrimalTol:        1.0000e-05
    dDualTol:          1.0000e-04
    dGapTol:           1.0000e-04
    dFeasTol:          1.0000e-08
    eRestartMethod:    1

--------------------------------------------------

     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
        0  +0.00000000e+00  +0.00000000e+00  +0.00e+00    2.36e+02  0.00e+00   0.00s [L]
        0  +0.00000000e+00  +0.00000000e+00  +0.00e+00    2.36e+02  0.00e+00   0.00s [A]
Termination check: 2.360339e+02|2.370339e-03  0.000000e+00|3.955477e-01  0.000000e+00|1.000000e-04
Termination check: 2.360339e+02|2.370339e-03  0.000000e+00|3.955477e-01  0.000000e+00|1.000000e-04
Last restart was iter 0: average
Last restart was iter 1: average
Last restart was iter 2: average
Last restart was iter 4: current
Last restart was iter 7: current
Last restart was iter 8: current
Last restart was iter 9: average
Last restart was iter 40: average
Last restart was iter 80: average
Last restart was iter 160: current
Last restart was iter 280: average
Last restart was iter 360: current
Last restart was iter 440: current
Last restart was iter 600: current
Last restart was iter 920: current
Last restart was iter 1120: current
Last restart was iter 1360: current
Last restart was iter 1520: average
Last restart was iter 1600: current
Last restart was iter 2520: current
Last restart was iter 2640: current
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
     4000  +2.11357849e+04  +2.11302185e+04  +5.57e+00    2.96e-02  2.64e-04   0.83s [L]
     4000  +2.11404979e+04  +2.11299275e+04  +1.06e+01    5.08e-03  2.09e-04   0.83s [A]
Termination check: 2.963712e-02|2.370339e-03  2.640344e-04|3.955477e-01  1.316966e-04|1.000000e-04
Termination check: 5.084463e-03|2.370339e-03  2.088359e-04|3.955477e-01  2.500615e-04|1.000000e-04
     Iter       Primal.Obj         Dual.Obj        Gap  Primal.Inf  Dual.Inf    Time
     4440  +2.11156665e+04  +2.11305657e+04  -1.49e+01    2.21e-02  1.84e-04   0.91s [L]
     4440  +2.11333909e+04  +2.11305333e+04  +2.86e+00    2.30e-03  1.61e-04   0.91s [A]

Solving information:        Optimal average solution.
          Primal objective: +2.11333909e+04
            Dual objective: +2.11305333e+04
   Primal infeas (abs/rel): 2.30e-03 / 9.70e-06
     Dual infeas (abs/rel): 1.61e-04 / 4.06e-08
     Duality gap (abs/rel): 2.86e+00 / 6.76e-05
      Number of iterations: 4440

Timing information:
    Total solver time 1.056000e+00 in 4440 iterations
           Solve time 9.140000e-01 in 4440 iterations
        Iters per sec 4.857768e+03
         Scaling time 2.600000e-02
        Presolve time 1.160000e-01
                   Ax 7.600000e-02 in 4668 calls
                  Aty 7.900000e-02 in 4668 calls
     ComputeResiduals 0.000000e+00 in 0 calls
       UpdateIterates 6.920000e-01 in 4440 calls

GPU Timing information:
          CudaPrepare 8.600000e-02
Alloc&CopyMatToDevice 8.000000e-03
      CopyVecToDevice 0.000000e+00
     DeviceMatVecProd 1.550000e-01
        CopyVecToHost 0.000000e+00
--------------------------------
--- saving to ./solution-sum.json
--------------------------------
  Free Device memory 0.000000e+00

D:\test\test_cudaLinear>

原因として考えられるのは、CPU-GPU間の転送部です。GPUのプログラムは、経験がないので良くわからないのですが、cuPDLPの本体記述によるものと、考えています。現在の実装の中規模以下では、頻繁にCPUと通信することがボトルネックになっていると推察されます。(Google版(CPU版)も見たのですが、cpPDLP(COPT版)の方に歩があると見ています。)

恐らく、超大規模問題では、GPU演算の時間が相対的に主体なので、CPU-GPU転送時間がボトルナックにはならないのではないでしょうか?現在のPDLPのトピックは、超大規模問題であり、Nvidiaもプロモートしています。

NVIDIA cuOpt で大規模な線形計画問題を加速する - NVIDIA 技術ブログ

ちなみに配送最適化問題については、こちらが詳しい

運搬経路問題(配送最適化問題,Vehicle Routing Problem) をPuLPで解く #Python - Qiita


しかし、我々の主な関心は、中大規模問題であり、主に商用のISMソルバの置き換えにあります。商用ISMソルバを使いたくても使えない庶民向けのソルバです。

よって、GPUの力を最大限発揮させるには、cuPDLP本体を記述し直しなおすしかない、という結論になります。

もう一つの問題は、WarmStartのサポートです。Simplexでは、WarmStartの恩恵があるのですが、FirstOrderにおいても、これは可能な筈です。これも現在のcuPDLPは、サポートされていないので、実装を検討する必要があります。

以上2点の実装を行う必要があります。

Highs ISMも組み込んでみたのですが、現在のスケジュールナースの速度の倍程度遅く、仮に将来マルチスレッド化されても、期待の改善度を上回ることはない、と判断しました。一方、FirstOrderは、高精度は、期待できないものの、WarmStartが魅力であり、GPUのスケーラビリティを生かせる可能性もあり、将来性があります。

以上より、Unresolved instances,INRC2 8weeks 2instancesとScheduling Benchmarks 2instances を解く為には、cuPDLPの実装し直しが必要であると結論しました。

Highs Teamが以上の要件を満足するように再実装してくれることを期待したいのですが、待っていられないし、COPTが率先して実装することもあり得ないと思うので、自分で行うことにしました。

0 件のコメント:

コメントを投稿