diff --git a/README.md b/README.md index 2ba56c7..b5a7ab6 100644 --- a/README.md +++ b/README.md @@ -200,6 +200,19 @@ Contributors: D'Arnese Eleonora, Conficconi Davide, Del Sozzo Emanuele, Fusco Lu If you find this repository useful, please use the following citation(s): +``` +@article{faber2022, +title={Faber: a Hardware/Soft-ware Toolchain for Image Registration}, +author={D'Arnese, Eleonora and Conficconi, Davide and Del Sozzo, Emanuele and Fusco, Luigi and Sciuto, Donatella and Santambrogio, Marco D}, +journal={IEEE Transactions on Parallel and Distributed Systems}, +year="2022", +publisher = "IEEE Computer Society", +address = "Los Alamitos, CA, USA", +pages = "To Appear" +} + +``` + ``` @inproceedings{iron2021, author = {Conficconi, Davide and D'Arnese, Eleonora and Del Sozzo, Emanuele and Sciuto, Donatella and Santambrogio, Marco D}, diff --git a/artifacts_tpds22_scripts/README.md b/artifacts_tpds22_scripts/README.md new file mode 100644 index 0000000..6620905 --- /dev/null +++ b/artifacts_tpds22_scripts/README.md @@ -0,0 +1,121 @@ +# Artifacts for IEEE Transaction on Parallel and Distributed Systems (TPDS) Open Initiative + +Paper Title: Faber a Hardware/Software Toolchain for Image Registration +Authors: Eleonora D'Arnese, Davide Conficconi, Emanuele Del Sozzo, Luigi Fusco, Donatella Sciuto, Marco D. Santambrogio. +Affiliation: Politecnico di Milano + +Paper Main Contributions: +* The first open-source HW/SW toolchain to automatically create custom Image Registration (IRG) pipelines exploiting FPGA-based accelerators. +* Three levels of customization hyperparameters to support users in building IRG pipelines +* A design automation methodology for non-FPGA experts to exploit default HW configurations as off-the-shelf SW +* A latency and resource model to guide HW expert users during the customization of the HW accelerators + +Faber achieves up to 54xin speedup and 177x in energy efficiency improvements over State of the Art + +## Artifacts' Objectives + +With this repo all the Faber's manuscript results can be reproduced: +* HW generation +* Single accelerator testing +* Resource prediction and actual usage extraction +* IRG application execution +* Accuracy Extraction +* Latency prediction +* State of the Art comparison + +We exclude the biomedical dataset since it is open and available, as well as Matlab and SimpleITK applications, and to respect their intellectual property. +Please Artifact Evaluators contact us for more details or for ready to use setup. + +## Testing Environment +1. We tested the hardware code generation on two different machines based on Ubuntu 18.04/20.4 and Centos OS 7.6 respectively. +2. We used Xilinx Vitis Unified Platform and Vivado HLx toolchains 2019.2. +3. We used Python 3 with `argparse` `numpy` `math` packets on the generation machine. +4. a) On the host machines, or hardware design machines, we used Pynq 2.5 on the Zynq based platforms (Pynq-Z2, Ultra96, Zcu104), where we employ `cv2`, `numpy`, `pandas`, `multiprocessing`, `statistics`, `argparse`, `pydicom`, and `scipy` packetes. For pure SW deployment the user will also need `torch` and `kornia` packets. +4. b) We tested the Alveo u200 on a machine with CentOS 7.6, i7-4770 CPU @ 3.40GHz, and 16 GB of RAM, and we installed Pynq 2.5.1 following the [instructions by the Pynq team](https://pynq.readthedocs.io/en/v2.5.1/getting_started/alveo_getting_started.html) with the same packets as point 4a. +5. [Optional] Possible issues with locale: export LANG="en_US.utf8". + +## Artifact Installation and Deployment Process +1. Make sure to have installed Vitis and Vivado 2019.2, the Alveo u200 and the U96 devices, as well as Python 3 and its packages. +2. Follow the [PYNQ's team instruction to setup your devices](https://pynq.readthedocs.io/en/v2.5.1/) (Note: the FPGAs can be on a completly different place than the building system) +2. Clone the repo `https://github.com/necst/faber_fpga.git -b artifacts_tpds22` +3. Prepare the environment e.g., `source -rt -l -rp `). + + +### Testing a HW Design + +1. Complete at least one design in the previous section, and prepare the HW design for deployment (i.e., `make resyn_extr_zynq_ultra96_v2 ` or `make resyn_extr_vts_alveo_u200`, done by all the bash for the artifacts). +2. `make pysw` creates a deploy folder for the Python code. +3. `make deploybitstr` or `make deployxclbin` `BRD_IP= BRD_USR= BRD_DIR=` copy onto the deploy folders the needed files. +4. connect to the remote device, i.e., via ssh `ssh @`. +5. [Optional] install all needed Python packages as above, or the pynq package on the Alveo host machine. +6. Navigate to the `/sw_py`. +7. + * 7a) Launch the script `python_tester_launcher.sh /bitstream_ultra96` (or where you transferred the folder of the .bit) for the Ultra96 testing. + * 7b) Modify the script with PLATFORM=Alveo, and launch the script `python_tester_launcher.sh /xclbn_alveo_u200` (or where you transfered the folder of the .xclbin) for the Alveo testing. + * 7c) the script will automatically detect the accelerator configuraiton (based on the folder name) and setup the testing of both, single accelerator, powell's, and 1+1 registrations, with a dataset structured as [descirbed here](#dataset_description). +8. `python_tester_extractor.sh /bitstream_ultra96 ` to automatically derive a .csv with most of the useful results of the experimental campaign. + +If you wish to have a **single test of the accelerator** for a single bitstream please follow these steps after previous step 6. +7. set `BITSTREAM=`, `CLK=200`, `CORE_NR=`, `PLATFORM=Alveo|Zynq`, `RES_PATH=path_results`, and source xrt on the Alveo host machine, e.g., `source /opt/xilinx/xrt/setup.sh`. +8. [Optional] `python3 test-single-mi.py --help` for a complete view of input parameters. +9. Execute the test `python3 test-single-mi.py -ol $BITSTREAM -clk $CLK -t $CORE_NR -p $PLATFORM -im 512 -rp $RES_PATH` (if on Zynq you will need `sudo`). + +To execute a **single registration**, follow similar steps such as the previous single accelerator test, or [have a look here](#deploymentexample). +`python3 faber-powell-blocked.py --help` will show a complete view of input parameters for powell HW-based registration procedure. + + +### Deployment Example on Ultra96/Alveo u200/ Pure SW +An example of deployment is `make deploybitstr TRGT_PLATFORM=ultra96_v2 BRD_USR=xilinx BRD_IP= BRD_DIR=/home/xilinx/faber` on the Ultra96. +For an Alveo u200 device `make deployxclbin TRGT_PLATFORM=alveo_u200 BRD_USR= BRD_IP= BRD_DIR=`. + +Connect to the host machine and go the target folder. \ No newline at end of file diff --git a/artifacts_tpds22_scripts/accuracy_eval.bash b/artifacts_tpds22_scripts/accuracy_eval.bash new file mode 100644 index 0000000..9b78bf9 --- /dev/null +++ b/artifacts_tpds22_scripts/accuracy_eval.bash @@ -0,0 +1,9 @@ +#!/bin/bash +cd .. +make hw_gen METRIC='mi' PE=1 CORE_NR=1 HT='float' FREQ_MHZ=200 TRGT_PLATFORM=alveo_u200; +make hw_gen METRIC='cc' PE=1 CORE_NR=1 FREQ_MHZ=200 TRGT_PLATFORM=alveo_u200; +make hw_gen METRIC='mse' PE=1 CORE_NR=1 FREQ_MHZ=200 TRGT_PLATFORM=alveo_u200; +make hw_gen METRIC='prz' PE=1 CORE_NR=1 HT='float' FREQ_MHZ=200 TRGT_PLATFORM=alveo_u200; +make resyn_extr_vts_alveo_u200 + +cd - \ No newline at end of file diff --git a/artifacts_tpds22_scripts/build_soa.bash b/artifacts_tpds22_scripts/build_soa.bash new file mode 100644 index 0000000..40262a0 --- /dev/null +++ b/artifacts_tpds22_scripts/build_soa.bash @@ -0,0 +1,7 @@ +#!/bin/bash +bash build_top_bits.bash +cd .. +make hw_gen PE=1 CORE_NR=3 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=zcu104 METRIC="mse" TRANSFORM="wax" +make resyn_extr_zynq_zcu104 + +cd - \ No newline at end of file diff --git a/artifacts_tpds22_scripts/build_top_bits.bash b/artifacts_tpds22_scripts/build_top_bits.bash new file mode 100644 index 0000000..3fba76b --- /dev/null +++ b/artifacts_tpds22_scripts/build_top_bits.bash @@ -0,0 +1,20 @@ +#!/bin/bash + +cd .. +make default_ultra96 +make default_alveo_u200 + +make hw_gen PE=1 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=ultra96_v2 METRIC="mse" TRANSFORM="wax"; +make hw_gen PE=1 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=ultra96_v2 METRIC="mi" TRANSFORM="wax"; +make hw_gen PE=1 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=ultra96_v2 METRIC="nmi" TRANSFORM="wax"; + +echo "Hello moto" + +make hw_gen METRIC='cc' PE=32 CORE_NR=1 FREQ_MHZ=300 TRGT_PLATFORM=alveo_u200; +make hw_gen METRIC='mse' PE=32 CORE_NR=1 FREQ_MHZ=300 TRGT_PLATFORM=alveo_u200; +make hw_gen METRIC='prz' PE=16 CORE_NR=1 HT='float' FREQ_MHZ=300 TRGT_PLATFORM=alveo_u200; + +make resyn_extr_zynq_ultra96_v2 +make resyn_extr_vts_alveo_u200 + +cd - \ No newline at end of file diff --git a/artifacts_tpds22_scripts/evaluate_model.bash b/artifacts_tpds22_scripts/evaluate_model.bash new file mode 100644 index 0000000..9cf1a8c --- /dev/null +++ b/artifacts_tpds22_scripts/evaluate_model.bash @@ -0,0 +1,15 @@ +#!/bin/bash +cd ../src/model + +python3 model.py -m cc -n 2 -pe 1 -b 8 -d 512 -p -t -i nn ultra96 > default_ultra96.log +python3 model.py -m mi -n 1 -pe 16 -b 8 -d 512 -p default_alveo_u200 > default_alveo_u200.log + +python3 model.py -m cc -n 2 -pe 1 -b 8 -d 512 -p -t -i nn ultra96 > waxmse_u96.log +python3 model.py -m cc -n 2 -pe 1 -b 8 -d 512 -p -t -i nn ultra96 > waxmi_u96.log +python3 model.py -m cc -n 2 -pe 1 -b 8 -d 512 -p -t -i nn ultra96 > waxnmi_u96.log + +python3 model.py -m cc -n 1 -pe 32 -b 8 -d 512 -p default_alveo_u200 > cc_alveo.log +python3 model.py -m mse -n 1 -pe 32 -b 8 -d 512 -p default_alveo_u200 > mse_alveo.log +python3 model.py -m nmi -n 1 -pe 16 -b 8 -d 512 -p default_alveo_u200 > nmi_alveo.log + +cd - \ No newline at end of file diff --git a/artifacts_tpds22_scripts/fps.bash b/artifacts_tpds22_scripts/fps.bash new file mode 100644 index 0000000..2ebd9a8 --- /dev/null +++ b/artifacts_tpds22_scripts/fps.bash @@ -0,0 +1,4 @@ +#!/bin/bash +cd ../; make default_ultra96; make default_ultra96 D=1024; make default_ultra96 D=2048; cd - +cd ../; make default_alveo_u200; make default_alveo_u200 D=1024; make default_alveo_u200 D=2048; cd - +cd ../; make resyn_extr_zynq_ultra96_v2; make resyn_extr_vts_alveo_u200; cd - \ No newline at end of file diff --git a/artifacts_tpds22_scripts/metric_w-wo_transform.bash b/artifacts_tpds22_scripts/metric_w-wo_transform.bash new file mode 100644 index 0000000..e673a28 --- /dev/null +++ b/artifacts_tpds22_scripts/metric_w-wo_transform.bash @@ -0,0 +1,60 @@ +#!/bin/bash +cd .. +############## Zynq builds +make hw_gen PE=1 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=ultra96_v2 METRIC="mi" TRANSFORM="wax"; +make hw_gen PE=2 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=ultra96_v2 METRIC="mi" ; + +make hw_gen PE=1 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=zcu104 METRIC="mi" TRANSFORM="wax"; +make hw_gen PE=2 CORE_NR=2 TARGET=hw CLK_FRQ=200 TRGT_PLATFORM=zcu104 METRIC="mi" ; + +make resyn_extr_zynq_ultra96_v2 + +############## Alveo builds +PES=(32 16 8 4 2 1) +CORE_NRS=(1) +HTYPS=('float') +PE_ENTROP=(1) +CACHING=(false) +URAM=(false) +FREQZ=200 +INTERPOLATIONS=('nearestn') + +for p in ${PES[@]}; do + for cn in ${CORE_NRS[@]}; do + for h in ${HTYPS[@]}; do + for pe in ${PE_ENTROP[@]}; do + for it in ${INTERPOLATIONS[@]}; do + for c in ${CACHING[@]}; do + for u in ${URAM[@]}; do + for wu in ${CACHING[@]}; do + make hw_gen PE=$p CORE_NR=$cn HT=$h TARGET=hw OPT_LVL=3 \ + FREQ_MHZ=$FREQZ PE_ENTROP=$pe CACHING=$c URAM=$u TRGT_PLATFORM=alveo_u200 \ + WAX_URAM=$wu METRIC="mi" INTERP_TYPE=$it TRANSFORM="wax"; + done; + done; + done; + done; + done; + done; + done; +done; + + +for p in ${PES[@]}; do + for cn in ${CORE_NRS[@]}; do + for h in ${HTYPS[@]}; do + for pe in ${PE_ENTROP[@]}; do + for c in ${CACHING[@]}; do + for u in ${URAM[@]}; do + make hw_gen PE=$p CORE_NR=$cn HT=$h TARGET=hw OPT_LVL=3 \ + FREQ_MHZ=$FREQZ PE_ENTROP=$pe CACHING=$c URAM=$u TRGT_PLATFORM=alveo_u200 METRIC="mi"; + done; + done; + done; + done; + done; +done; + +make resyn_extr_vts_alveo_u200 + +cd - \ No newline at end of file