Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for procedure annotations to the assembler #1434

Closed
Tracked by #304
bobbinth opened this issue Aug 7, 2024 · 8 comments
Closed
Tracked by #304

Add support for procedure annotations to the assembler #1434

bobbinth opened this issue Aug 7, 2024 · 8 comments
Assignees
Labels
assembly Related to Miden assembly
Milestone

Comments

@bobbinth
Copy link
Contributor

bobbinth commented Aug 7, 2024

There are several use cases for adding support for procedure annotations in MASM. For example:

  • Directives to the assembler on how to assembler a program.
    • One example of this is inline annotation which could inform the assembler that the annotated procedure should be inlined.
    • Another example is an annotation indicating whether a given procedure is expected to be call-ed, syscall-ed, or exec-ed.
  • Information about procedure signature which could be useful for the compiler (and maybe debugger).
  • User defined annotation. One example of these is the storage_offset annotation which we need in miden base to define storage offsets for account procedures.

I think @bitwalker had some ideas about how annotations could work. Let's discuss these in this issue.

@bobbinth bobbinth added the assembly Related to Miden assembly label Aug 7, 2024
@bobbinth bobbinth added this to the v0.11.0 milestone Aug 7, 2024
@bitwalker
Copy link
Contributor

In #1436, I proposed some syntax changes in Miden Assembly that I think would solve for our issues with kernel procedure visibilities, and defining/maintaining important procedure metadata like the type signature.

However, there are a number of other things, such as inlining for which annotations would be preferable to dedicated syntax:

  • Inlining
  • Interacting with external tooling, such as linter/formatter
  • Decorators, such as storage offsets, automatically emitting debug events when a given procedure is called, etc.

The following is my proposed syntax for annotations:

# Grammar Rules
#
# ANNO := "@" "[" <META> "]" => <>
#
# META := <name:ID> => Annotation::from(name)
#       | <name:ID> "(" <tokens:Token> ")" => Annotation::from((name, tokens))
#       | <name:ID> "=" <value:META_EXPR> => Annotation::from((name, value))
#
#  META_EXPR := <name:ID> => Meta::Path(name.into())
#             | <string:String> => Meta::String(string)
#             | <int:u64> => Meta::Int(int)

# Following is an example of how this would be used to solve various things from our list:

# Inline this procedure if we're looking for inline opportunities
@[inline]
proc.add_unchecked
  ...
end

# Force this procedure to be inlined
@[inline(always)]
proc.intrinsic
   ...
end

# Mark this procedure as deprecated, and tell the linter to ignore any warnings generated by its body
@[allow(warnings::all)]
@[deprecated(since = "0.10.0")]
proc.deprecated
   ...
end

# Specify a storage offset
@[storage_offset = 1024]
proc.get_item
   ...
end

I think this syntax provides us with enough flexibility to support essentially any type of attribute we might feasibly need, and supports arbitrary extension for user-defined attributes.

I think we should require registering callbacks with the parser for user-defined attributes, dispatched based on the attribute name. Then, if we parse a module which contains unhandled attributes, we can raise a diagnostic. The user-provided callback would be given the opportunity to parse the attribute value, and return that value or a diagnostic if the value is invalid. This makes the system very extensible, while still keeping the API quite simple.

It would be important to store attributes in the procedure metadata of compiled libraries as well, so that this information can be used downstream. For attributes which only apply during parsing/compilation, they can be stripped, but some of these cannot: inline, deprecated at least, maybe storage_offset?. In any case, user-defined attributes would certainly need to be preserved, as we cannot know how they are meant to be used (or at the least, we provide a mechanism to indicate whether to preserve them or not as part of "registering" an attribute).

I think that about sums it up for now. I'm using "attribute" and "annotation" interchangeably here, but they are the same thing IMO.

@Fumuran
Copy link
Contributor

Fumuran commented Sep 2, 2024

Probably we also need to add an annotation for a dead code, which should be compiled anyway.

We came across this use case in the 0xPolygonMiden/miden-base#803 PR: almost all procedures in the kernel will be called only by their hash, so it makes sense to change them from export to proc, but in that case we cannot get the hashes of these procedures, since they were not compiled and were not added to the MastForest.

As @bobbinth mentioned in his comment, we should add an annotation for such procedures, which should be compiled even if they are considered a dead code.

@bobbinth
Copy link
Contributor Author

bobbinth commented Sep 2, 2024

The above is essentially for procedures which could be invoked dynamically but we don't want to export them from a give module. So, maybe "dead code" is not the right term here. Maybe we could use something like dynamic attribute like so:

@[dynamic]
proc.foo
    ...
end

The assembler would treat such a procedure as if it was an export procedure but it would not add it to the list of exports for the module.

Another option is to use a visibility modifier as discussed in #1436 (comment) and the related thread. Maybe something like:

proc(dynamic).foo
    ...
end

bitwalker added a commit that referenced this issue Sep 23, 2024
This commit introduces the concept of attributes/annotations to Miden
Assembly procedures. These can be used to represent interesting/useful
bits of metadata associated with specific procedures in three different
forms:

* Marker attributes, e.g. `#[inline]`, a name with no associated data
* List attributes, e.g. `#[inline(always)]`, i.e. a parameterized
  marker; multiple values can be provided as comma-delimited values.
* Key-value attributes, e.g. `#[key = <value>]`, where `<value>` can be
  a "meta expression" of three possible types: identifier, string, or
  integer (in either decimal or hexadecimal format).

Attributes will provide the foundation for upcoming changes that will
rely on being able to attach metadata to procedures. For now, attributes
may _only_ be attached to procedure definitions, not re-exports or any
other syntactic construct.

NOTE: This does not yet act on any attributes, nor store them anywhere
when assembling to MAST. For now, they are simply parsed, made available
in the AST, and ignored. Future PRs will introduce these as needed.

Closes #1434
bitwalker added a commit that referenced this issue Sep 23, 2024
This commit introduces the concept of attributes/annotations to Miden
Assembly procedures. These can be used to represent interesting/useful
bits of metadata associated with specific procedures in three different
forms:

* Marker attributes, e.g. `#[inline]`, a name with no associated data
* List attributes, e.g. `#[inline(always)]`, i.e. a parameterized
  marker; multiple values can be provided as comma-delimited values.
* Key-value attributes, e.g. `#[key = <value>]`, where `<value>` can be
  a "meta expression" of three possible types: identifier, string, or
  integer (in either decimal or hexadecimal format).

Attributes will provide the foundation for upcoming changes that will
rely on being able to attach metadata to procedures. For now, attributes
may _only_ be attached to procedure definitions, not re-exports or any
other syntactic construct.

NOTE: This does not yet act on any attributes, nor store them anywhere
when assembling to MAST. For now, they are simply parsed, made available
in the AST, and ignored. Future PRs will introduce these as needed.

Closes #1434
bitwalker added a commit that referenced this issue Sep 25, 2024
This commit introduces the concept of attributes/annotations to Miden
Assembly procedures. These can be used to represent interesting/useful
bits of metadata associated with specific procedures in three different
forms:

* Marker attributes, e.g. `@inline`, a name with no associated data
* List attributes, e.g. `@inline(always)`, i.e. a parameterized
  marker; multiple values can be provided as comma-delimited metadata
  expressions.
* Key-value attributes, e.g. `@props(<key> = <value>, ...)`, where
  `<key>` must be a valid identifier, and `<value>` must be a valid
  metadata expression. Multiple key-value attributes can be set at once,
  or you can specify the same key-value attribute multiple times, so
  long as each instance does not have any keys that conflict with
  previous instances.

Metadata expressions come in three possible types:

* bare identifier, e.g. `foo`
* quoted string, e.g. `"some text"`
* integer value, either decimal or hexadecimal format, e.g. `1` or
  `0x01`

Attributes will provide the foundation for upcoming changes that will
rely on being able to attach metadata to procedures. For now, attributes
may _only_ be attached to procedure definitions, not re-exports or any
other syntactic construct.

NOTE: This does not yet act on any attributes, nor store them anywhere
when assembling to MAST. For now, they are simply parsed, made available
in the AST, and ignored. Future PRs will introduce these as needed.

Closes #1434
bitwalker added a commit that referenced this issue Sep 25, 2024
This commit introduces the concept of attributes/annotations to Miden
Assembly procedures. These can be used to represent interesting/useful
bits of metadata associated with specific procedures in three different
forms:

* Marker attributes, e.g. `@inline`, a name with no associated data
* List attributes, e.g. `@inline(always)`, i.e. a parameterized
  marker; multiple values can be provided as comma-delimited metadata
  expressions.
* Key-value attributes, e.g. `@props(<key> = <value>, ...)`, where
  `<key>` must be a valid identifier, and `<value>` must be a valid
  metadata expression. Multiple key-value attributes can be set at once,
  or you can specify the same key-value attribute multiple times, so
  long as each instance does not have any keys that conflict with
  previous instances.

Metadata expressions come in three possible types:

* bare identifier, e.g. `foo`
* quoted string, e.g. `"some text"`
* integer value, either decimal or hexadecimal format, e.g. `1` or
  `0x01`

Attributes will provide the foundation for upcoming changes that will
rely on being able to attach metadata to procedures. For now, attributes
may _only_ be attached to procedure definitions, not re-exports or any
other syntactic construct.

NOTE: This does not yet act on any attributes, nor store them anywhere
when assembling to MAST. For now, they are simply parsed, made available
in the AST, and ignored. Future PRs will introduce these as needed.

Closes #1434
@bobbinth
Copy link
Contributor Author

#1510 added parsing of annotations to the assembler. I think what remains propagating this information through to MastForest. Should we create a different issue for this?

@bitwalker
Copy link
Contributor

bitwalker commented Sep 27, 2024

Probably we also need to add an annotation for a dead code, which should be compiled anyway.

We came across this use case in the 0xPolygonMiden/miden-base#803 PR: almost all procedures in the kernel will be called only by their hash, so it makes sense to change them from export to proc, but in that case we cannot get the hashes of these procedures, since they were not compiled and were not added to the MastForest.

The assembler would treat such a procedure as if it was an export procedure but it would not add it to the list of exports for the module.

I would argue that any procedure that is callable from outside the object produced by the assembler, should be marked export, regardless of whether it is meant to be called indirectly or not. These visibility modifiers are the only meaningful way to convey information about whether something is semantically exported or not.

However, I do think we need to ensure that indirect calls to library-internal procedures is supported. In that case, the indirect call isn't to an "exported" procedure, and more like capturing the address of a function and calling it. To do so, we can track which procedures have had their "address taken" via procref, and ensure that those procedures are indirectly callable. We'll need to do the tracking part any way in order to support dead code elimination - which is sort of what you are getting at, except for I don't think we need an annotation for it: procedures can either be marked export (which will prevent them from being DCE'd), or have their address taken via procref (which will indicate that they are used, and thus not candidates for DCE).

IMO, the more interesting option for an annotation, would be something like @derive(dyncall), which would instruct the assembler to generate two variants of the procedure, one which is a dyncall-able wrapper around the original procedure that handles the inconsistency introduced by the hash argument remaining on the operand stack. Call sites would then be modified to call the appropriate procedure based on what invoke instruction is used (i.e. exec gets foo, while dynexec gets foo_dyn). Ideally the dyn* instructions could be modified so as to not require this bifurcation of callees, but as a workaround, generating the boilerplate for them would be a major convenience IMO.

@bitwalker
Copy link
Contributor

#1510 added parsing of annotations to the assembler. I think what remains propagating this information through to MastForest. Should we create a different issue for this?

Probably a good idea - there are a few open questions around how this information should be handled that I think we'll want to pin down:

  • The whole "can we support annotating re-exported procedures" question. I think an easy "yes" can be given for some annotations. For example, if we add support for lints/deprecations, and wanted to mark a re-export as @deprecated("0.2.0", "use bar instead"), that's fine, as the re-export being deprecated when the original definition is not has value; similarly, they could both be marked deprecated differently, and resolving that conflict is obvious, as the one closest to the call site takes precedence. It's a completely different situation for annotations that aren't just "soft" metadata though, and may imply something about the behavior of the callee, or worse, change the behavior of either the callee or the caller. Propagating attributes in such a way that we can "do the right thing" at a call site for a given procedure can be a non-trivial question, if every re-export can potentially add new annotations that may override/conflict with those of the actual definition. We don't necessarily need to answer these questions now, but the answer will affect how we propagate this information in the assembler.
  • We probably should reserve some annotations that we're likely to need, and raise diagnostics if they are used (until we actually implement handling for them). For example, I'd probably reserve right out of the gate: warn, allow, forbid, deprecated, error, warning, test, derive, inline, type, doc, storage, feature, cfg, global, const, constant, dynamic, and whatever else is vaguely keyword like, or that we already anticipate using in the future or think we might want to use in the future.
  • We might want to flag the annotations we need with the highest priority, ideally with different categories of semantics, and look at how we'd support those, and use that process to make sure whatever implementation we come up with for propagating annotation information is solid enough to support all of them without too many contortions.

@bobbinth
Copy link
Contributor Author

#1510 added parsing of annotations to the assembler. I think what remains propagating this information through to MastForest. Should we create a different issue for this?

Probably a good idea -

Created #1517 for this.

@bobbinth
Copy link
Contributor Author

I would argue that any procedure that is callable from outside the object produced by the assembler, should be marked export, regardless of whether it is meant to be called indirectly or not. These visibility modifiers are the only meaningful way to convey information about whether something is semantically exported or not.

Generally, I agree, though there are a couple of nuances in this specific case.

  1. These procedures are not meant to be called directly - they need a wrapper and without this wrapper they won't function properly. So, there is a publicly exposed wrapper and this wrapper then calls these procedures dynamically.
  2. These procedures are exported from the kernel module. So, since they are marked as export, they get included into the Kernel struct. This shouldn't happen and we either need to post-process the struct to remove them (which would be quite hacky). Though, maybe this will get resolved with procedure visibility modifiers discussed in #1436.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assembly Related to Miden assembly
Projects
None yet
Development

No branches or pull requests

3 participants