[BACKEND][CPU] Convert tt.func and tt.return #4

minjang · 2024-05-12T09:36:38Z

Summary: This is stll a kind of the boilerplate and basic lowering for the first milestone (compiling vector addition). This PR firstly lowers tt.func and tt.return.

Test Plan: It can safely compile an empty kernel.

@triton.jit
def add_kernel(x_ptr,  y_ptr, output_ptr, n_elements, BLOCK_SIZE: tl.constexpr):
    return

> TRITON_ENABLE_LLVM_DEBUG=1 TRITON_CPU_BACKEND=1 python3 empty_kernel.py
...
//===-------------------------------------------===//
Legalizing operation : 'tt.func'(0x73be2a0) {
  * Fold {
  } -> FAILURE : unable to fold

  * Pattern : 'tt.func -> ()' {
Trying to match "(anonymous namespace)::FuncOpConversion"
    ** Insert  : 'llvm.func'(0x6c04c70)
    ** Insert Block into : 'llvm.func'(0x6c04c70)
    ** Insert Block into : 'llvm.func'(0x6c04c70)
    ** Erase   : 'tt.func'(0x73be2a0)
"(anonymous namespace)::FuncOpConversion" result 1

    //===-------------------------------------------===//
    Legalizing operation : 'llvm.func'(0x6c04c70) {
    } -> SUCCESS : operation marked legal by the target
    //===-------------------------------------------===//

...

//===-------------------------------------------===//
Legalizing operation : 'tt.return'(0x73efeb0) {
  "tt.return"() : () -> ()

  * Fold {
  } -> FAILURE : unable to fold

  * Pattern : 'tt.return -> ()' {
Trying to match "(anonymous namespace)::ReturnOpConversion"
    ** Insert  : 'llvm.return'(0x73c0f00)
    ** Replace : 'tt.return'(0x73efeb0)
"(anonymous namespace)::ReturnOpConversion" result 1

    //===-------------------------------------------===//
    Legalizing operation : 'llvm.return'(0x73c0f00) {
      "llvm.return"() : () -> ()

    } -> SUCCESS : operation marked legal by the target
    //===-------------------------------------------===//
  } -> SUCCESS : pattern applied successfully

...


INSTCOMBINE ITERATION #1 on add_kernel
ADD:   ret void, !dbg !6
IC: Visiting:   ret void, !dbg !6
tensor([1.3713, 1.3076, 0.4940,  ..., 0.6724, 1.2141, 0.9733], device='cuda:0')
tensor([0., 0., 0.,  ..., 0., 0., 0.], device='cuda:0')
The maximum difference between torch and triton is 1.9935195446014404

Summary: This is stll a kind of the boilerplate and basic lowering for the first milestone (compiling vector addition). This PR firstly lowers `tt.func` and `tt.return`. Test Plan: It can safely compile an empty kernel. ``` @triton.jit def add_kernel(x_ptr, y_ptr, output_ptr, n_elements, BLOCK_SIZE: tl.constexpr): return ``` > TRITON_ENABLE_LLVM_DEBUG=1 TRITON_CPU_BACKEND=1 python3 empty_kerne.py ``` //===-------------------------------------------===// Legalizing operation : 'tt.func'(0x73be2a0) { * Fold { } -> FAILURE : unable to fold * Pattern : 'tt.func -> ()' { Trying to match "(anonymous namespace)::FuncOpConversion" ** Insert : 'llvm.func'(0x6c04c70) ** Insert Block into : 'llvm.func'(0x6c04c70) ** Insert Block into : 'llvm.func'(0x6c04c70) ** Erase : 'tt.func'(0x73be2a0) "(anonymous namespace)::FuncOpConversion" result 1 //===-------------------------------------------===// Legalizing operation : 'llvm.func'(0x6c04c70) { } -> SUCCESS : operation marked legal by the target //===-------------------------------------------===// ... //===-------------------------------------------===// Legalizing operation : 'tt.return'(0x73efeb0) { "tt.return"() : () -> () * Fold { } -> FAILURE : unable to fold * Pattern : 'tt.return -> ()' { Trying to match "(anonymous namespace)::ReturnOpConversion" ** Insert : 'llvm.return'(0x73c0f00) ** Replace : 'tt.return'(0x73efeb0) "(anonymous namespace)::ReturnOpConversion" result 1 //===-------------------------------------------===// Legalizing operation : 'llvm.return'(0x73c0f00) { "llvm.return"() : () -> () } -> SUCCESS : operation marked legal by the target //===-------------------------------------------===// } -> SUCCESS : pattern applied successfully ```

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 12, 2024

minjang changed the title ~~Convert tt.func and tt.return~~ [BACKEND][CPU] Convert tt.func and tt.return May 12, 2024

minjang marked this pull request as ready for review May 13, 2024 01:00

minjang requested review from htyu, bertmaher, plotfi and manman-ren May 13, 2024 01:00

bertmaher approved these changes May 13, 2024

View reviewed changes

bertmaher merged commit 25e6cfc into pytorch-labs:main May 13, 2024
3 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BACKEND][CPU] Convert tt.func and tt.return #4

[BACKEND][CPU] Convert tt.func and tt.return #4

minjang commented May 12, 2024 •

edited

Loading

[BACKEND][CPU] Convert tt.func and tt.return #4

[BACKEND][CPU] Convert tt.func and tt.return #4

Conversation

minjang commented May 12, 2024 • edited Loading

minjang commented May 12, 2024 •

edited

Loading