IMAX2/3/4 Applications
crypto/sha256,
fft/fft,
filter/filter (一般フィルタ,超解像,フレーム補間,距離画像生成等),
llama/llama (llama-v2),
mm_cnn_lf/cnn,
mm_cnn_lf/cnn3d,
mm_cnn_lf/gather (離散ステンシル:Lightfieldレンダリング),
mm_cnn_lf/gdepth (離散ステンシル:Lightfield距離画像),
mm_cnn_lf/inv (逆行列),
mm_cnn_lf/mm (密行列積),
rsim/rsim (normal MNIST/CIFAR10/CNN),
sort/sort (パイプラインソート),
spgemm/test022 (SpGEMM),
spgemm/test024 (疎行列圧縮),
ssim/ssim (stochastic MNIST/CIFAR10/CNN),
stencil/stencil (degree=1,2,3各種ステンシル計算),
stringsearch/search (文字列検索),
tsim/tsim (multithread MNIST/CIFAR10/CNN),
vsim/vsim (GGML),
vbgmm,
graph-cnn,
graph-attention,
U-net,
Vector-DB
IMAX2/3/4 Docs/Tutorials
Download IMAX2/3/4
Introduction to IMAX3: Amazing Dataflow-Centric Gen4-CGLA(non-CGRA) (CGLA:Coarse Grained Linear Array)
Petalinux 2024.1 IMAX2 Kit for basic CGLA
ZCU102+VU440 (64/128/192/256/512 units /single lane) ... Vivado project is included.
- vu440# connect with zcu102 (see figure)
- vu440# write VU440-step4000-20250423-V24.1-78.125+78.125+48+260+130+48-SPU.bin to SDcard
- vu440# insert SDcard
- linux# zcat arch28-step4000-master.img.gz | dd bs=64k of=/dev/mmcblk0 (16GB SDcard)
- linux# mount /dev/mmcblk0p2 /mnt
- linux# replace root-password in /mnt/etc/shadow
- linux# umount /mnt
- zcu102# insert SDcard
- zcu102# boot from SDcard (dhcp)
- linux% ssh -Y debian@163.221.xxx.yyy (Xwindow)
- zcu102% zcat proj-arm64.tgz|tar xpf -
- zcu102% cd proj-arm64/sample/mm_cnn_lf
- zcu102% make -f Makefile-zynq.emax6+dma mm-zynq.emax6+dma (how to make)
- zcu102% sudo proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma (matrix-mult)
- passwd: temppwd
Petalinux 2024.1 IMAX3 Kit for professional CGLA
VPK180 (64 units x8 lanes) ... Vivado project is included.
- linux# zcat alice110-step4800-master.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
- linux# zcat alice112-step4800-slave1.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
- linux# zcat alice114-step4800-slave2.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
- linux# zcat alice116-step4800-slave3.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
- linux# mount /dev/mmcblk0p2 /mnt
- linux# replace root-password in /mnt/etc/shadow
- linux# umount /mnt
- vpk180# connect four boards w/ QSFPDD-DAC cable
- vpk180# insert SDcard
- vpk180# boot from SDcard (dhcp)
- linux% ssh -Y debian@163.221.xxx.yyy (Xwindow)
- vpk180% zcat proj-arm64.tgz|tar xpf -
- vpk180% cd proj-arm64/sample/mm_cnn_lf
- vpk180% make -f Makefile-acap.emax7+dma mm-acap.emax7+dma (how to make)
- vpk180% sudo proj-arm64/sample/mm_cnn_lf/mm-acap.emax7+dma (matrix-mult)
- vpk180% sudo proj-arm64/sample/test/test025-acap.emax7+dma (dual matrix-mult)
- vpk180% cd proj-arm64/sample/tsim (MNIST/CIFAR10)
- vpk180% sudo ./tsim-acap.emax7+dma -x -i -r -I0 -C1 -F1 (MNIST conv*1+fc inference)
- vpk180% sudo ./tsim-acap.emax7+dma -x -t -I0 -C1 -F1 (MNIST conv*1+fc training)
- vpk180% sudo ./tsim-acap.emax7+dma -x -i -r -I0 -C3 -F1 (MNIST conv*3+fc inference)
- vpk180% sudo ./tsim-acap.emax7+dma -x -t -I0 -C3 -F1 (MNIST conv*3+fc training)
- vpk180% sudo ./tsim-acap.emax7+dma -x -i -r -I1 -C6 -F2 (CIFAR10 conv6+fc2 inference)
- vpk180% sudo ./tsim-acap.emax7+dma -x -t -I1 -C6 -F2 (CIFAR10 conv6+fc2 training)
- vpk180% sudo ./tsim-acap.emax7+dma -x -i -r -I1 -C6 -F2 -M16 (CIFAR10 multi-lane)
- vpk180% sudo ./vsim-acap.emax7+dma gptneox -m /home/nakashim/.cformers/models/OpenAssistant/oasst-sft-1-pythia-12b/int4_fixed_zero --prompt "50278 12092 2 0 50281" --seed 42 --threads 2 --n_predict 100 --top_k 20 --top_p 0.95 --temp 0.85 --repeat_last_n 64 --repeat_penalty 1.3 (GGML)
- vpk180% sudo ./llama-cli-acap.emax7+dma -t 4 -s 8 -fa -m ~/.llama/model/rinna-youri-7b-instruction-gguf/rinna-youri-7b-instruction-q2_K.gguf -p "Prime numbers smaller than ten" -n 32 (LLAMA-v2)
Petalinux 2024.1 IMAX4 Kit for Intel servers
PCI-e(VPK120)+VPK180 (64 units x8/x16 lanes) ... Vivado project is included.