diff --git a/README.md b/README.md index 95bcf7c85368..948c4815c502 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ This repository extends the LLVM framework to generate code for use with AMD/Xil ## Architecture Overview -Generally speaking, AI Engine processors are in-order, exposed-pipeline VLIW processors. These processors are implemented as part of an array of processors focused on application acceleration targeting AI, Machine Learning, and DSP applications. They have been integrated in a number of commercial devices including the [Versal AI Core Series](https://www.xilinx.com/products/silicon-devices/acap/versal-ai-core.html) and [Ryzen-AI SOCs](https://www.amd.com/en/products/processors/consumer/ryzen-ai.html). +Generally speaking, AI Engine processors are in-order, exposed-pipeline VLIW processors. These processors are implemented as part of an array of processors focused on application acceleration targetting AI, Machine Learning, and DSP applications. They have been integrated in a number of commercial devices including the [Versal AI Core Series](https://www.amd.com/en/products/adaptive-socs-and-fpgas/versal/ai-core-series.html) and [Ryzen-AI SOCs](https://www.amd.com/en/products/processors/consumer/ryzen-ai.html). Each VLIW instruction bundle specifies the behavior of one or more functional units, which begin executing a new instruction at the same time. The processor pipeline does not include stall logic, and instructions will continue executing in order regardless of other instructions in the pipeline. As a result, the compiler is able to schedule machine instructions which access the same register in ways that potentially overlap. e.g. @@ -23,8 +23,8 @@ Each VLIW instruction bundle specifies the behavior of one or more functional un Other key architectural characteristics include varying width instruction slots between different instruction encodings and relatively small address spaces (20-bit pointer registers). The presence of varying-width instruction slots implies some code alignment restrictions for instructions which are branch or return targets. For more information, see: -[AIE1 architecture manual](https://docs.xilinx.com/r/en-US/am009-versal-ai-engine) and -[AIE2 architecture manual](https://docs.xilinx.com/r/en-US/am020-versal-aie-ml) +[AIE1 architecture manual](https://docs.amd.com/r/en-US/am009-versal-ai-engine) and +[AIE2 architecture manual](https://docs.amd.com/r/en-US/am020-versal-aie-ml) ## Implementation @@ -33,7 +33,7 @@ The AIE target includes basic support for both AIE1 and AIE2 instruction sets. | Subtarget | Instruction Selection | VLIW scheduling | Software Pipelining | Hardware Loops | delay slot filling | Floating point | Vector Intrinsics | post-increment addressing | | --- | --- | --- | --- | --- | --- | --- | --- | --- | -| AIE1 | SelectionDAG | PostRA Scheduler | Not Yet | Not Yet | Not Yet | Yes, vector fp32 | Yes, i8, i16, i32 | No | +| AIE1 | SelectionDAG | PostRA Scheduler | No | No | No | Yes, vector fp32 | Yes, i8, i16, i32 | No | | AIE2 | GlobalISel | Machine Scheduler | Experimental | Experimental | Yes | Yes, vector bfloat | Yes, i8, i16 | Yes | Support for Clang, LLD, binutils (e.g. 'llvm-objdump'), Compiler-RT, and LLVM-LIBC is also included. @@ -41,10 +41,11 @@ Support for Clang, LLD, binutils (e.g. 'llvm-objdump'), Compiler-RT, and LLVM-LI In order to support the unusual architecture features of AI Engine, this repository adds LLVM support for several specific features: - support for non-power of 2 pointers; - improved TableGen support for specifying operand latencies and resource conflicts of exposed pipeline instructions; -- scheduler support for negative operand latencies; +- scheduler support for negative operand latencies (i.e. an instruction writing to a register may be scheduled *after* a corresponding use); +- scheduler support for slot assignment when instructions that can be issued in multiple VLIW slots; - support for selecting relocations for instructions with multiple encodings; - support for architectures with code alignment restrictions; -- improved register allocation support for complex register hierarchies; +- improved register allocation support for complex register hierarchies, specifically related to spills of sub-registers of large compound-registers; ## Disclaimer