Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sem: re-sem all call arguments in macro output #1192

Draft
wants to merge 20 commits into
base: devel
Choose a base branch
from

Conversation

zerbina
Copy link
Collaborator

@zerbina zerbina commented Feb 16, 2024

Call operands are now re-analyzed when they're not marked as having been analyzed (via the nfSem flag) -- having a type or not is no longer relevant. This improves stability of inserting typed AST verbatim into macro output, since expression are now properly re-typed.

There is a lot of analysis that is destructive to the shape of AST, meaning that it's oftentimes not possible to discard the types of a typed expression, re-analyze it, and expect the correct type. For this reason, some special handling is required for some AST throughout sem.

Fixes #1183.
Fixes #1181.


To-Do

  • (partially) keep the shape of array construction like [1: x, ...]
  • fix the remaining problems and make all existing tests succeed again
  • split out changes where possible
  • add regression tests for the fixed issues
  • write a proper commit message

Notes for Reviewers

  • still in progress, but the general direction should be visible

Instead of looking for typed expression, `prepareOperand` looks for
already analyzed (as indicated by the `nfSem` flag) operands when
deciding whether to analyze the expression again or not.
With the `nfSem` becoming much more important, it makes sense to
display node flags in tracing output.
Borrow target searching runs overload resolution with typed `nkEmpty`
nodes as the parameter. This worked when types disabled re-semming, but
it no longer does. To disable analysis of the nodes, they're marked
with `nfSem`.
`asBracketExpr` now also detects `[](x, y)` as something to try as a
built-in subscript operation if `x` is a bound generic symbol or symbol
choice.

Not going the `semSubscript` route was previously not a problem, as the
symbol or symbol choice had a type and thus `semOperand` (which reports
an error for un-instantiated generic symbols) wasn't called again, and
so `semOverloadedCall` successfully picked the `mArrGet` overload
(which then call `semSubscript`).

Since the type no longer has a say on whether an operand is analyzed or
not, `asBracketExpr` *must* work properly.
A `[](x, y)` was not turned into a bracket call early enough if `x` is
the `mArrGet` magic *directly* (only symbol choices were considered).
This is problem  if `x` is an unresolved generic procedure.

* move the correct detection logic from `shouldBeBracketExpr` into a
  shared procedure
* use the shared procedure in `asBracketExpr`
* attempt bracket expression restoration prior to overload resolution
  when the `mArrGet` magic matches directly
Hidden nodes produced by sigmatch are all marked with the `nfSem` flag
now, in order to prevent them from being discarded by again by re-
typing (triggered by `semFinishOperands`).
Rework `semSetConstr` such that it:
* doesn't mutate the input AST
* doesn't use `localReport
* analyzes `nkRange` nodes
* propagates errors
The `nkBracket` array construction created by sigmatch cannot
be passed to `semArrayConstr`, as local converters might be involved,
which `semArrayConstr` has no knowledge about. Therefore, `semExpr`
does nothing for varargs array constructions and passes them back to
`sigmatch`, where the containers are retyped via `matchContainer`.
`semExpr` had no support for `nkNimNodeLit`erals, meaning that their
appearance in macro/template was always led to an error.
Don't make a full copy of the left-hand operand when building an
`nkDotCall` -- the tree was analyzed immediately before. With the tree
copy, the AST was treated as requiring analysis again, which is
unnecessarily wasteful.
If the resolved `tyStatic` type is bound to a generic `tyStatic` and
later becomes the type of a parameter in a routine, without the
`tfHasStatic` flag, the argument to the parameter is not going to be
constant-evaluated again when re-matching the overload during
re-typing.
It doesn't work correctly for statement list expressions, and the
`copyTree` also loses the `nfSem` flag, leading to `semFinishOperands`
sem'ing the argument again and assigning it the wrong type.
The constant expressions produced by `paramTypesMatchAux` are marked as
`nfSem`, so that `semFinishOperands` doesn't re-type them (which would
lose the `tyStatic` type).
* `nkHiddenAddr`, `nkHiddenStdConv`, `nkHiddenSubConv`,
  `nkHiddenCallConv`, `nkHiddenSubConv`, and `nkCheckedFieldExpr` are
  all replaced with their content
* `nkHiddenDeref` is attempted to be recreated, but is replaced with
  its sub-expression if the sub-expression is not a `var` or `lent`
  type
A call to `semArrayConstr` would type a constant sequence construction
(`nkBracket`) as an `array`, which is wrong. For now, a constant
sequence construction is kept as is during re-typing.
This prevents it from being re-typed (which could lose its type) when
substituting the `static` parameter with it.
Somehow the AST taken directly from the parameter symbol is not marked
with `nfSem`. This needs some further investigation.
@zerbina zerbina added bug Something isn't working compiler/sem Related to semantic-analysis system of the compiler labels Feb 16, 2024
@zerbina zerbina marked this pull request as draft February 16, 2024 02:23
Comment on lines 2616 to 2617
# XXX This is unsound! 'formal' can differ from overloaded routine to
# overloaded routine!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soft suggestion, but you can remove this if you'd like.

I don't think this is valid anymore, because of untyped vs non-untyped argument isolation, the soon to be included nfSem flag guards, and the removal of the tyIterable hack a while back.

@Varriount
Copy link
Contributor

There is a lot of analysis that is destructive to the shape of AST, meaning that it's oftentimes not possible to discard the types of a typed expression, re-analyze it, and expect the correct type

Out of curiosity, why is this the case? Is it just how the implementation was originally written? I would normally expect analysis procedures to be fairly side-effect free (at least with regards to their inputs).

@zerbina
Copy link
Collaborator Author

zerbina commented Mar 6, 2024

Out of curiosity, why is this the case? Is it just how the implementation was originally written?

Most likely because it was simpler to implement. If typed AST doesn't have to be transformable back into exactly the AST is originated from, one has to put a less thought into the design of the semantic analysis phase.

I would normally expect analysis procedures to be fairly side-effect free (at least with regards to their inputs).

Analysis here refers to the semX procedures of semantic analysis, which have a large amount of side-effects, many of them irreversible. A lot of work has went into removing input mutations from the semX procedure, so at least the input aspect has gotten a lot better.

With "destructive to the shape", I'm referring to things like the following:

type MyInt = distinct int
const C = MyInt(1)

discard C
# before analysis and typing, the untyped AST of the discard looks like
# this: `(DiscardStmt (Ident "C"))`
# the typed AST (i.e., the AST after analysis and typing) looks like this:
# `(DiscardStmt (IntLit 1))`
# Throwing away all types would mean that the information of the int literal
# being a `MyInt` is irreversibly lost 

@saem
Copy link
Collaborator

saem commented Mar 6, 2024

There is a lot of analysis that is destructive to the shape of AST, meaning that it's oftentimes not possible to discard the types of a typed expression, re-analyze it, and expect the correct type

Out of curiosity, why is this the case? Is it just how the implementation was originally written? I would normally expect analysis procedures to be fairly side-effect free (at least with regards to their inputs).

Because of things like automatically converting or dereferencing things, we need to indicate that somehow. There are also desugaring things, like let a, b, c, d = 10, getting converted to 4 individual let, but those are all largely benign; we could leave it alone, but then processing macro pragmas would become seriously problematic. There are also various ambiguities that need to be resolved, like foo.bar is that a field access or a call. Furthermore, there are some additional challenges, such as abstraction, which goes beyond simple desugaring, such as for loop conversion to while; although this isn't exposed to macros today, the inability to "pierce the veil" is actually problematic for things like the CPS library (they want to be able to see the for expansion). You can apply that last bit to lambda lifting, too and a variety of transform.

Now lots of this is the current implementation's problem, but some of it isn't. The typed AST itself needs a grammar, because the fundamental way the compiler and macros should talk is through syntax, this is leveraging one of the big lessons of homoiconic languages (lisp, racket, etc), without altering the AST in key areas, this isn't possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compiler/sem Related to semantic-analysis system of the compiler
Projects
None yet
3 participants