Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DebugInfo][HGLDD] Consuming Tywaves annotations containing extra source language information from Chisel and improve HGLDD format #7246

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

rameloni
Copy link

@rameloni rameloni commented Jun 27, 2024

This PR adds new functionality to CIRCT for emitting source language type information in HGLDD
#6983.

This new functionality support the new backend of my project. Tywaves: a type-based waveform viewer. Tywaves adds support for Chisel to Surfer through HGLDD, however it misses of Chisel-type information and a proper documentation.

So, in addition to the changes I made to CIRCT, I also created a library for reading and documenting HGLDD in rust. I noticed that HGLDD misses a formal specification, so the rust documentation is a first attempt to provide more formal description of the fields contained in the file.

Specific contribution to CIRCT

  • Extract source language type information (not FIRRTL) encoded as TywavesAnnotation by Chisel [DebugInfo] Emit source language scala type information and pass it to CIRCT chipsalliance/chisel#4224
  • Associate it to respective MLIR operations
  • Consume it during MaterializeDebugInfo and convert to debug dialect operations
  • Add two new debug dialect operations to support internal representation of extra debug info for subfields of aggregates and modules (dbg.subfield and dbg.moduleinfo)
    def SubFieldOp : DebugOp<"subfield"> {
    let summary = "A named value to be captured in debug info which is a subfield of an aggregate";
    let description = [{
    Marks a subfield of aggregate to be tracked in DI under the given name.
    It is similar to `dbg.variable`, both store a value and contain source language name, type,
    and constructor parameters, but `dbg.subfield` returns also a value. Unlike a `dbg.variable`,
    it is contained in other debug operations like `dbg.struct` or `dbg.array` (here the usage of
    the returned value). It is only used to represent a subfield of an aggregate and it cannot be
    used to represent a variable directly declared in a module.
    The addition of support for source language type and constructor parameters for top variables
    and subfields (also nested) required to build this additional operation. The `dbg.variable`
    explicitly represents the "top" variable instances in a module. For this reason, it wasn't used to mark
    and it cannot mark subfields of aggregates.
    The `dbg.subfield` doesn't have a `scope` operand, because it is a descendant of a `dbg.variable`.
    }];
    let arguments = (ins
    StrAttr:$name,
    AnyType:$value,
    OptionalAttr<StrAttr>:$typeName,
    OptionalAttr<ArrayAttr>:$params
    );
    let results = (outs SubFieldType:$result);
    let assemblyFormat = [{
    $name `,` $value attr-dict `:` type($value)
    }];
    }
    def ModuleInfoOp : DebugOp<"moduleinfo"> {
    let summary = "Define extra debug information for a module";
    let description = [{
    Creates debug information for a module. If present, this operations provides
    extra information about the module type in the source language, such as its
    type name and constructor parameters.
    }];
    let arguments = (ins
    StrAttr:$typeName,
    OptionalAttr<ArrayAttr>:$params
    );
    let assemblyFormat = [{ attr-dict }];
    }
  • Emit that information into HGLDD for modules, variables and subfields
    JObject
    FileEmitter::emitSourceLangTypeInfo(const DISourceLang &sourceLangType) {
    JObject obj;
    if (const auto &typeName = sourceLangType.typeName)
    obj["type_name"] = typeName.getValue();
    if (const auto &params = sourceLangType.params) {
    JArray paramsArray;
    for (const auto &param : params) {
    if (isa<DictionaryAttr>(param)) {
    auto paramObj = JObject{};
    for (const auto &entry : cast<DictionaryAttr>(param))
    paramObj[entry.getName().getValue()] =
    cast<StringAttr>(entry.getValue()).getValue();
    paramsArray.push_back(std::move(paramObj));
    }
    }
    obj["params"] = std::move(paramsArray);
    }
    return obj;
    }
    // Emit source language type information for a module
    auto sourceLangTypeInfo = emitSourceLangTypeInfo(module->sourceLangType);
    if (!sourceLangTypeInfo.empty())
    json.attribute("source_lang_type_info", std::move(sourceLangTypeInfo));
    // Emit source language type information for a variable (if any)
    auto sourceLangTypeInfo = emitSourceLangTypeInfo(variable->sourceLangType);
    if (!sourceLangTypeInfo.empty())
    json.attribute("source_lang_type_info", std::move(sourceLangTypeInfo));
  • Document HGLDD https://github.com/rameloni/tywaves-rs/blob/main/src/hgldd/spec.rs

…uctor parameters information)

The source language type information is now available to CIRCT (through the tywaves annotation mechanism):
   - Consume and transform tywaves annotations to mlir operations.
   - The annotations are added to the firrtl dialect (`applyTywaves`) if the debug information should be enabled (-g). Otherwise useless.
   - Add tests for `applyTywaves`.
- Modified the debug dialect to support source language type information (type and constructor parameters):
	- Added optional (depending on the user debug level chosen they may not be present) `type` and `params` attributes to `VariableOp`
	- Created `SubFieldOp` + `SubFieldType` return type.
		- A `SubFieldOp` contains the source language information and the reference to the actual operation which is preserved (it can be any type).
		- A `SubFieldOp` is contained in a `dbg.struct` or`dbg.array`.
	- Add the `SubFieldOp` to represent source language information for subfields of aggregates and keep the distinction between subfields and variables in the debug dialect
- Updated `MaterializeDebugInfo` to create proper debug operations:
	- Convert Tywaves annotations to the respective debug dialect operations
	- If the tywaves annotations are not present, the behaviour does not change
	- Checks for the number of tywaves annotations associated to a target/operation
		- A specific target should have only 1 tywaves annotation
		- The annotations must not have errors in the format
- Tests:
	- Conversion to dbg dialect operations
	- Error handling tests
@rameloni rameloni changed the title [DebugInfo][HGLDD] [DebugInfo][HGLDD] Consuming Tywaves annotations from Chisel Jun 27, 2024
@rameloni rameloni changed the title [DebugInfo][HGLDD] Consuming Tywaves annotations from Chisel [DebugInfo][HGLDD] Consuming Tywaves annotations containing extra source language information from Chisel and improve HGLDD Jun 27, 2024
@rameloni
Copy link
Author

The related PR in chisel: chipsalliance/chisel#4224

@rameloni rameloni marked this pull request as draft June 27, 2024 15:42
@rameloni rameloni changed the title [DebugInfo][HGLDD] Consuming Tywaves annotations containing extra source language information from Chisel and improve HGLDD [DebugInfo][HGLDD] Consuming Tywaves annotations containing extra source language information from Chisel and improve HGLDD format Jun 27, 2024
@darthscsi
Copy link
Contributor

Some general comments:
We are trying very hard to not add new annotations.
I thought the current debug efforts were grabbing chisel types and only falling back to firrtl types if asked. It might be this isn't done yet, in which case the efforts should be coordinated. @fabianschuiki or @prithayan might be able to shed more light.

@rameloni
Copy link
Author

rameloni commented Jul 2, 2024

Some general comments:
We are trying very hard to not add new annotations.
I thought the current debug efforts were grabbing chisel types and only falling back to firrtl types if asked. It might be this isn't done yet, in which case the efforts should be coordinated. @fabianschuiki or @prithayan might be able to shed more light.

@darthscsi thanks for the comment.
Yeah, I found that Intrinsics might be a better option to do that. However, I'm not sure if I can make it before the deadline of my thesis. That's why I'm using at the moment the annotations, since I've found they were slightly easier to manage on average between Chisel and CIRCT sides

@rameloni
Copy link
Author

rameloni commented Jul 3, 2024

I have also noticed another thing. HGLDD also missed reference to enum variants, so I'm trying to add that.
I've seen that the hw dialect has a definition of EnumType however it seems not to be generated?!? I'm not sure, am I wrong?
Chisel passes the enum variants through 3 annotations (EnumComponentAnnotation, EnumDefAnnotation, and EnumVecAnnotation) but they are dropped in the LowerAnnotations pass, is there a reason for that? Apart from the fact you are trying to replace the annotation mechanism.

@rameloni
Copy link
Author

To support debug info about enum variants I also added new changes to the debug dialect
next to the ones of #7246 (comment).

Other CIRCT contributions (supporting enums in HGLDD)

  • a new dbg.enumdef operation to store the mapping of enum variants.
    def EnumDefOp : DebugOp<"enumdef"> {
    let summary = "Define the value variants of an enumeration";
    let description = [{
    Creates a definition of an enumeration type.
    It is useful to reconstruct the named variants of an enum from a raw value.
    Variants can be internally represented as a map (Int -> String).
    It might be possible that the user select the order and single raw values
    of the variants. Therefore, it is not possible to use an array to map the
    named variants with the integer value.
    This operation is declared once per enum present in a scope.
    The result of this operation is used as operand in `dbg.variable` and
    `dbg.subfield` operations if they originate from an enum type.
    }];
    let arguments = (ins
    StrAttr:$enumTypeName,
    I16Attr:$id,
    DictionaryAttr:$variantsMap,
    Optional<ScopeType>:$scope
    );
    let results = (outs EnumDefType:$result);
    let assemblyFormat = [{
    $enumTypeName `,` `id` $id `,` $variantsMap (`scope` $scope^)? attr-dict
    }];
    }
  • added an optional operand to dbg.variable and dbg.subfield to contain the result of dbg.enumdef
  • Emit an enum_defs entry in HGLDD
    auto enumDefs = emitEnumDefs(module->enumDefinitions);
    if (!enumDefs.empty())
    json.attribute("enum_defs", std::move(enumDefs));

    JObject FileEmitter::emitEnumDefs(const DIEnumDefMap &enumDefinitions) {
    JObject obj;
    for (auto [id, mapValues] : enumDefinitions) {
    JObject map;
    for (auto [value, name] : mapValues)
    map[std::to_string(value)] = name.getValue();
    obj[std::to_string(id)] = std::move(map);
    }
    return obj;
    }

@fabianschuiki
Copy link
Contributor

Hey @rameloni, sorry for taking so long to respond. Your work is very exciting! Thanks for pushing on getting debug information communicated from Chisel to CIRCT 🥳.

I agree with @darthscsi that using intrinsics here would be very nice. They would allow you to directly annotate information onto Chisel and FIRRTL values directly in the IR, without going through the annotations mechanism which we are trying to phase out as much as possible. Intrinsics should be almost identical in handling as annotations, with the difference that you have more control over where they appear in the IR and you never have to magically materialize MLIR ops from a JSON data structure.

One concern regarding debug info output: the HGLDD format is used by Synopsys VCS/Verdi to ingest debug information. I'm pretty sure that changes to the format, especially things that add new constructs and fields and operators, will break the Synopsys-side implementation. The format is also tailored to work with VCS-internal code patterns and limitations. Since you are working on your own waveform viewer, would it be possible to simply create a new, more streamlined debug info format for your viewer? You could just copy-paste the HGLDD emission code with your changes, if you want to stick with those. But I'm pretty sure you've found a few things in HGLDD to be weird or cumbersome, and creating your own output format would be an opportunity to make things more streamlined for you 😃. But it would be great to keep the HGLDD format unchanged since we don't control the other half of its implementation.

@rameloni
Copy link
Author

rameloni commented Jul 15, 2024

Hey @fabianschuiki, thanks for the feedback. At the beginning I created a draft file format different from HGLDD but later I thought that extending HGLDD could've been also an opportunity to make the new features available to tools using that. I also though that two formats could create confusion 😅. My thesis deadline is soon and, at this point, I am not sure if creating a new format is a good idea, emitting the updated HGLDD as "another" format is super fast instead.
I also could revert this commit about the other format draft. If you'd like to check it this is the link: https://github.com/rameloni/circt/blob/source-lang-di-tywaves-file/lib/Target/DebugInfo/EmitTywavesHGLDD.cpp (I used HGLDD emit as an inspiration). But the implementation of the viewer now uses HGLDD although it is thought to be extended also to other formats.

Regarding intrinsics, I'd really like to replace my annotations with them, I'm trying to finish quickly my remaining tasks to do so by the end of July, but it is more likely that I'll make it after my defense.

Also, annotations are already used for the Chisel enums, which means that also those would need to be replaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants