IMAX2/3/4 Applications

crypto/sha256, fft/fft, filter/filter (一般フィルタ,超解像,フレーム補間,距離画像生成等), llama/llama (llama-v2), mm_cnn_lf/cnn, mm_cnn_lf/cnn3d, mm_cnn_lf/gather (離散ステンシル:Lightfieldレンダリング), mm_cnn_lf/gdepth (離散ステンシル:Lightfield距離画像), mm_cnn_lf/inv (逆行列), mm_cnn_lf/mm (密行列積), rsim/rsim (normal MNIST/CIFAR10/CNN), sort/sort (パイプラインソート), spgemm/test022 (SpGEMM), spgemm/test024 (疎行列圧縮), ssim/ssim (stochastic MNIST/CIFAR10/CNN), stencil/stencil (degree=1,2,3各種ステンシル計算), stringsearch/search (文字列検索), tsim/tsim (multithread MNIST/CIFAR10/CNN), vsim/vsim (GGML), vbgmm, graph-cnn, graph-attention, U-net

IMAX2/3/4 Docs/Tutorials

Download IMAX2/3/4

Introduction to IMAX3: Amazing Dataflow-Centric Gen4-CGLA(non-CGRA) (CGLA:Coarse Grained Linear Array)

Introductive slides with synthesizable notes

0.非常識に理解するコンピュータ(0.予告編) 0.IMAX3 begins(0.Trailer)
1.非常識に理解するコンピュータ(1.集めたデータはどこに置くのがいいの?) 1.IMAX3 begins(1.Where is the best location to save data?)
2.非常識に理解するコンピュータ(2.データに置き方ってあるの?) 2.IMAX3 begins(2.Is there a manner to put data?)
3.非常識に理解するコンピュータ(3.計算って何のこと?) 3.IMAX3 begins(3.What do you mean by calculation?)
4.非常識に理解するコンピュータ(4.押しかけるのがいいの?待つのがいいの?) 4.IMAX3 begins(4.Should I push? Should I wait?)

Expertized slides with synthesizable notes

0.Let's start Gen3-CGLA(non-CGRA)
2.Image filters basic
3.Image filters advanced
4.Image filters professional
5.Machine Learning
6.High-degree stencil computation
7.Inverse matrix
8.Sparse matrix and Sorting
9.Hash, FFT and String search
10.High-speed compiler
11.Three level sophisticated loop
12.拡張性編 12.Scalability
13.HW/SW協調設計編 13.HW/SW codesign
0-13.短い総集編(#1-#13) 0-13.Short summary(#1-#13)
0-13.長い総集編(#1-#13) 0-13.Long summary(#1-#13)
14.CPU/Vectorとの違い編 14.Difference from CPU/Vector
15.ソフト制御キャッシュの仕組み 15.Software-controlled cache memory
20.CGLAあみだくじ 20.Decision Tree

Petalinux 2024.1 IMAX2 Kit for basic CGLA

ZU19EG (16 units) ... Vivado project is included.

  1. linux# zcat ZU19EG-step4000-20241111.img.gz | dd bs=64k of=/dev/mmcblk0 (16GB SDcard)
  2. linux# mount /dev/mmcblk0p2 /mnt
  3. linux# replace root-password in /mnt/etc/shadow
  4. linux# umount /mnt
  5. zu19eg# insert SDcard
  6. zu19eg# boot from SDcard (dhcp)
  7. linux% ssh -Y (Xwindow)
  8. zu19eg% zcat proj-arm64.tgz|tar xpf -
  9. zu19eg% cd proj-arm64/sample/mm_cnn_lf
  10. zu19eg% make -f Makefile-zynq.emax6+dma mm-zynq.emax6+dma-16st (how to make)
  11. zu19eg% sudo proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma-16st (matrix-mult)
  12. passwd: temppwd
  13. localhost:11.0: Cannot open display
  14. zu19eg% cp ~/.Xauthority /tmp/111
  15. zu19eg% sudo cp /tmp/111 /root/.Xauthority
  16. zu19eg% sudo proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma-16st (retry)
  17. <<<ORIG>>>
  18. usec: ARM:2098589 DRAIN:0 CONF:0 REGV:0 RANGE:0 LOAD:0 EXEC:0 total:2098589 (usec)
  19. <<<IMAX>>>
  20. usec: ARM:426 DRAIN:1224 CONF:105 REGV:1041 RANGE:663 LOAD:14861 EXEC:24324 total:42647 (usec)

ZCU102+VU440 (64/128/192/256/512 units /single lane) ... Vivado project is included.

  1. vu440# connect with zcu102 (see figure)
  2. vu440# write VU440-step4000-20221020-V24.1-78.125+78.125+48+260+130+48-CRYPTO-SPU.bin to SDcard
  3. vu440# insert SDcard
  4. linux# zcat ZCU102-step4000-20201010.img.gz | dd bs=64k of=/dev/mmcblk0 (16GB SDcard)
  5. linux# mount /dev/mmcblk0p2 /mnt
  6. linux# replace root-password in /mnt/etc/shadow
  7. linux# umount /mnt
  8. zcu102# insert SDcard
  9. zcu102# boot from SDcard (dhcp)
  10. linux% ssh -Y (Xwindow)
  11. zcu102% zcat proj-arm64.tgz|tar xpf -
  12. zcu102% cd proj-arm64/sample/mm_cnn_lf
  13. zcu102% make -f Makefile-zynq.emax6+dma mm-zynq.emax6+dma (how to make)
  14. zcu102% sudo proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma (matrix-mult)
  15. passwd: temppwd

Petalinux 2024.1 IMAX3 Kit for professional CGLA

VMK180 (32 units) ... Vivado project is included.

  1. linux# zcat alice139-step4000.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  2. linux# mount /dev/mmcblk0p2 /mnt
  3. linux# replace root-password in /mnt/etc/shadow
  4. linux# umount /mnt
  5. vmk180# insert SDcard
  6. vmk180# boot from SDcard (dhcp)
  7. linux% ssh -Y (Xwindow)
  8. vmk180% zcat proj-arm64.tgz|tar xpf -
  9. vmk180% cd proj-arm64/sample/mm_cnn_lf
  10. vmk180% make -f Makefile-acap.emax7+dma mm-acap.emax7+dma-32st (how to make)
  11. vmk180% sudo proj-arm64/sample/mm_cnn_lf/mm-acap.emax7+dma-32st (matrix-mult)
  12. passwd: temppwd

VMK180 (32 units x2 lanes) ... Vivado project is included.

  1. linux# zcat alice135-step4200-master.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  2. linux# zcat alice137-step4200-slave-img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  3. linux# mount /dev/mmcblk0p2 /mnt
  4. linux# replace root-password in /mnt/etc/shadow
  5. linux# umount /mnt
  6. vmk180# connect two boards w/ QSFP28-AOC cable
  7. vmk180# insert SDcard
  8. vmk180# boot from SDcard (dhcp)
  9. linux% ssh -Y (Xwindow)
  10. vmk180% zcat proj-arm64.tgz|tar xpf -
  11. vmk180% cd proj-arm64/sample/mm_cnn_lf
  12. vmk180% make -f Makefile-acap.emax7+dma mm-acap.emax7+dma-32st (how to make)
  13. vmk180% sudo proj-arm64/sample/mm_cnn_lf/mm-acap.emax7+dma-32st (matrix-mult)
  14. vmk180% sudo proj-arm64/sample/test/test025-acap.emax7+dma-32st (dual matrix-mult)
  15. vmk180% cd proj-arm64/sample/tsim (MNIST/CIFAR10)
  16. vmk180% sudo ./tsim-acap.emax7+dma-32st -x -i -r -I0 -C1 -F1 (MNIST conv1+fc inference)
  17. vmk180% sudo ./tsim-acap.emax7+dma-32st -x -t -I0 -C1 -F1 (MNIST conv1+fc training)
  18. vmk180% sudo ./tsim-acap.emax7+dma-32st -x -i -r -I0 -C3 -F1 (MNIST conv3+fc inference)
  19. vmk180% sudo ./tsim-acap.emax7+dma-32st -x -t -I0 -C3 -F1 (MNIST conv3+fc training)
  20. vmk180% sudo ./tsim-acap.emax7+dma-32st -x -i -r -I1 -C6 -F2 (CIFAR10 conv6+fc2 inference)
  21. vmk180% sudo ./tsim-acap.emax7+dma-32st -x -t -I1 -C6 -F2 (CIFAR10 conv6+fc2 training)

VPK180 (64 units x2 lanes)

  1. linux# zcat alice120-step4800-master.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  2. linux# mount /dev/mmcblk0p2 /mnt
  3. linux# replace root-password in /mnt/etc/shadow
  4. linux# umount /mnt
  5. vpk180# insert SDcard
  6. vpk180# boot from SDcard (dhcp)
  7. linux% ssh -Y (Xwindow)
  8. vpk180% zcat proj-arm64.tgz|tar xpf -
  9. vpk180% cd proj-arm64/sample/mm_cnn_lf
  10. vpk180% make -f Makefile-acap.emax7+dma mm-acap.emax7+dma (how to make)
  11. vpk180% sudo proj-arm64/sample/mm_cnn_lf/mm-acap.emax7+dma (matrix-mult)
  12. vpk180% cd proj-arm64/sample/tsim (MNIST/CIFAR10)
  13. vpk180% sudo ./tsim-acap.emax7+dma -x -i -r -I0 -C1 -F1 (MNIST conv*1+fc inference)
  14. vpk180% sudo ./tsim-acap.emax7+dma -x -t -I0 -C1 -F1 (MNIST conv*1+fc training)
  15. vpk180% sudo ./tsim-acap.emax7+dma -x -i -r -I0 -C3 -F1 (MNIST conv*3+fc inference)
  16. vpk180% sudo ./tsim-acap.emax7+dma -x -t -I0 -C3 -F1 (MNIST conv*3+fc training)
  17. vpk180% sudo ./tsim-acap.emax7+dma -x -i -r -I1 -C6 -F2 (CIFAR10 conv6+fc2 inference)
  18. vpk180% sudo ./tsim-acap.emax7+dma -x -t -I1 -C6 -F2 (CIFAR10 conv6+fc2 training)
  19. vpk180% sudo ./vsim-acap.emax7+dma gptneox -m /home/nakashim/.cformers/models/OpenAssistant/oasst-sft-1-pythia-12b/int4_fixed_zero --prompt "50278 12092 2 0 50281" --seed 42 --threads 1 --n_predict 100 --top_k 20 --top_p 0.95 --temp 0.85 --repeat_last_n 64 --repeat_penalty 1.3 (GGML)
  20. vpk180% sudo ./llama-cli-acap.emax7+dma -t 4 -s 1 -fa -m ~/.llama/model/rinna-youri-7b-instruction-gguf/rinna-youri-7b-instruction-q2_K.gguf -p "Prime numbers smaller than ten" -n 32 (LLAMA-v2)

VPK180 (64 units x8 lanes)

  1. linux# zcat alice120-step4800-master.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  2. linux# zcat alice122-step4800-slave1.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  3. linux# zcat alice124-step4800-slave2.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  4. linux# zcat alice126-step4800-slave3.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  5. linux# mount /dev/mmcblk0p2 /mnt
  6. linux# replace root-password in /mnt/etc/shadow
  7. linux# umount /mnt
  8. vpk180# connect four boards w/ QSFPDD-DAC cable
  9. vpk180# insert SDcard
  10. vpk180# boot from SDcard (dhcp)
  11. linux% ssh -Y (Xwindow)
  12. vpk180% zcat proj-arm64.tgz|tar xpf -
  13. vpk180% cd proj-arm64/sample/mm_cnn_lf
  14. vpk180% make -f Makefile-acap.emax7+dma mm-acap.emax7+dma (how to make)
  15. vpk180% sudo proj-arm64/sample/mm_cnn_lf/mm-acap.emax7+dma (matrix-mult)
  16. vpk180% sudo proj-arm64/sample/test/test025-acap.emax7+dma (dual matrix-mult)
  17. vpk180% cd proj-arm64/sample/tsim (MNIST/CIFAR10)
  18. vpk180% sudo ./tsim-acap.emax7+dma -x -i -r -I0 -C1 -F1 (MNIST conv*1+fc inference)
  19. vpk180% sudo ./tsim-acap.emax7+dma -x -t -I0 -C1 -F1 (MNIST conv*1+fc training)
  20. vpk180% sudo ./tsim-acap.emax7+dma -x -i -r -I0 -C3 -F1 (MNIST conv*3+fc inference)
  21. vpk180% sudo ./tsim-acap.emax7+dma -x -t -I0 -C3 -F1 (MNIST conv*3+fc training)
  22. vpk180% sudo ./tsim-acap.emax7+dma -x -i -r -I1 -C6 -F2 (CIFAR10 conv6+fc2 inference)
  23. vpk180% sudo ./tsim-acap.emax7+dma -x -t -I1 -C6 -F2 (CIFAR10 conv6+fc2 training)
  24. vpk180% sudo ./tsim-acap.emax7+dma -x -i -r -I1 -C6 -F2 -M16 (CIFAR10 multi-lane)
  25. vpk180% sudo ./vsim-acap.emax7+dma gptneox -m /home/nakashim/.cformers/models/OpenAssistant/oasst-sft-1-pythia-12b/int4_fixed_zero --prompt "50278 12092 2 0 50281" --seed 42 --threads 2 --n_predict 100 --top_k 20 --top_p 0.95 --temp 0.85 --repeat_last_n 64 --repeat_penalty 1.3 (GGML)
  26. vpk180% sudo ./llama-cli-acap.emax7+dma -t 4 -s 8 -fa -m ~/.llama/model/rinna-youri-7b-instruction-gguf/rinna-youri-7b-instruction-q2_K.gguf -p "Prime numbers smaller than ten" -n 32 (LLAMA-v2)

Petalinux 2024.1 IMAX4 Kit for Intel servers

PCI-e(VPK120)+VPM180 (64 units x8/x16 lanes) ... Vivado project is included.

  • IMAX4 170MHz, 512 units, 20480 operations / 4 cycles, 512KB-cache/unit
  • each unit has:32-load/8-store, quad-sparse-load, 3-cascaded octa-int/media, octa-single-float FMA, 32-stochastic FMA, Dual addr-synchronizer