sem: re-sem all call arguments in macro output #1192

zerbina · 2024-02-16T02:23:34Z

Call operands are now re-analyzed when they're not marked as having been analyzed (via the nfSem flag) -- having a type or not is no longer relevant. This improves stability of inserting typed AST verbatim into macro output, since expression are now properly re-typed.

There is a lot of analysis that is destructive to the shape of AST, meaning that it's oftentimes not possible to discard the types of a typed expression, re-analyze it, and expect the correct type. For this reason, some special handling is required for some AST throughout sem.

Fixes #1183.
Fixes #1181.

To-Do

(partially) keep the shape of array construction like [1: x, ...]
fix the remaining problems and make all existing tests succeed again
split out changes where possible
add regression tests for the fixed issues
write a proper commit message

Notes for Reviewers

still in progress, but the general direction should be visible

Instead of looking for typed expression, `prepareOperand` looks for already analyzed (as indicated by the `nfSem` flag) operands when deciding whether to analyze the expression again or not.

With the `nfSem` becoming much more important, it makes sense to display node flags in tracing output.

Borrow target searching runs overload resolution with typed `nkEmpty` nodes as the parameter. This worked when types disabled re-semming, but it no longer does. To disable analysis of the nodes, they're marked with `nfSem`.

`asBracketExpr` now also detects `[](x, y)` as something to try as a built-in subscript operation if `x` is a bound generic symbol or symbol choice. Not going the `semSubscript` route was previously not a problem, as the symbol or symbol choice had a type and thus `semOperand` (which reports an error for un-instantiated generic symbols) wasn't called again, and so `semOverloadedCall` successfully picked the `mArrGet` overload (which then call `semSubscript`). Since the type no longer has a say on whether an operand is analyzed or not, `asBracketExpr` *must* work properly.

A `[](x, y)` was not turned into a bracket call early enough if `x` is the `mArrGet` magic *directly* (only symbol choices were considered). This is problem if `x` is an unresolved generic procedure. * move the correct detection logic from `shouldBeBracketExpr` into a shared procedure * use the shared procedure in `asBracketExpr` * attempt bracket expression restoration prior to overload resolution when the `mArrGet` magic matches directly

Hidden nodes produced by sigmatch are all marked with the `nfSem` flag now, in order to prevent them from being discarded by again by re- typing (triggered by `semFinishOperands`).

Rework `semSetConstr` such that it: * doesn't mutate the input AST * doesn't use `localReport * analyzes `nkRange` nodes * propagates errors

The `nkBracket` array construction created by sigmatch cannot be passed to `semArrayConstr`, as local converters might be involved, which `semArrayConstr` has no knowledge about. Therefore, `semExpr` does nothing for varargs array constructions and passes them back to `sigmatch`, where the containers are retyped via `matchContainer`.

`semExpr` had no support for `nkNimNodeLit`erals, meaning that their appearance in macro/template was always led to an error.

Don't make a full copy of the left-hand operand when building an `nkDotCall` -- the tree was analyzed immediately before. With the tree copy, the AST was treated as requiring analysis again, which is unnecessarily wasteful.

If the resolved `tyStatic` type is bound to a generic `tyStatic` and later becomes the type of a parameter in a routine, without the `tfHasStatic` flag, the argument to the parameter is not going to be constant-evaluated again when re-matching the overload during re-typing.

It doesn't work correctly for statement list expressions, and the `copyTree` also loses the `nfSem` flag, leading to `semFinishOperands` sem'ing the argument again and assigning it the wrong type.

The constant expressions produced by `paramTypesMatchAux` are marked as `nfSem`, so that `semFinishOperands` doesn't re-type them (which would lose the `tyStatic` type).

* `nkHiddenAddr`, `nkHiddenStdConv`, `nkHiddenSubConv`, `nkHiddenCallConv`, `nkHiddenSubConv`, and `nkCheckedFieldExpr` are all replaced with their content * `nkHiddenDeref` is attempted to be recreated, but is replaced with its sub-expression if the sub-expression is not a `var` or `lent` type

A call to `semArrayConstr` would type a constant sequence construction (`nkBracket`) as an `array`, which is wrong. For now, a constant sequence construction is kept as is during re-typing.

This prevents it from being re-typed (which could lose its type) when substituting the `static` parameter with it.

Somehow the AST taken directly from the parameter symbol is not marked with `nfSem`. This needs some further investigation.

saem · 2024-02-16T19:27:19Z

compiler/sem/sigmatch.nim

    # XXX This is unsound! 'formal' can differ from overloaded routine to
    # overloaded routine!


Soft suggestion, but you can remove this if you'd like.

I don't think this is valid anymore, because of untyped vs non-untyped argument isolation, the soon to be included nfSem flag guards, and the removal of the tyIterable hack a while back.

Varriount · 2024-03-06T21:44:35Z

There is a lot of analysis that is destructive to the shape of AST, meaning that it's oftentimes not possible to discard the types of a typed expression, re-analyze it, and expect the correct type

Out of curiosity, why is this the case? Is it just how the implementation was originally written? I would normally expect analysis procedures to be fairly side-effect free (at least with regards to their inputs).

zerbina · 2024-03-06T22:03:27Z

Out of curiosity, why is this the case? Is it just how the implementation was originally written?

Most likely because it was simpler to implement. If typed AST doesn't have to be transformable back into exactly the AST is originated from, one has to put a less thought into the design of the semantic analysis phase.

I would normally expect analysis procedures to be fairly side-effect free (at least with regards to their inputs).

Analysis here refers to the semX procedures of semantic analysis, which have a large amount of side-effects, many of them irreversible. A lot of work has went into removing input mutations from the semX procedure, so at least the input aspect has gotten a lot better.

With "destructive to the shape", I'm referring to things like the following:

type MyInt = distinct int
const C = MyInt(1)

discard C
# before analysis and typing, the untyped AST of the discard looks like
# this: `(DiscardStmt (Ident "C"))`
# the typed AST (i.e., the AST after analysis and typing) looks like this:
# `(DiscardStmt (IntLit 1))`
# Throwing away all types would mean that the information of the int literal
# being a `MyInt` is irreversibly lost

saem · 2024-03-06T22:19:33Z

There is a lot of analysis that is destructive to the shape of AST, meaning that it's oftentimes not possible to discard the types of a typed expression, re-analyze it, and expect the correct type

Out of curiosity, why is this the case? Is it just how the implementation was originally written? I would normally expect analysis procedures to be fairly side-effect free (at least with regards to their inputs).

Because of things like automatically converting or dereferencing things, we need to indicate that somehow. There are also desugaring things, like let a, b, c, d = 10, getting converted to 4 individual let, but those are all largely benign; we could leave it alone, but then processing macro pragmas would become seriously problematic. There are also various ambiguities that need to be resolved, like foo.bar is that a field access or a call. Furthermore, there are some additional challenges, such as abstraction, which goes beyond simple desugaring, such as for loop conversion to while; although this isn't exposed to macros today, the inability to "pierce the veil" is actually problematic for things like the CPS library (they want to be able to see the for expansion). You can apply that last bit to lambda lifting, too and a variety of transform.

Now lots of this is the current implementation's problem, but some of it isn't. The typed AST itself needs a grammar, because the fundamental way the compiler and macros should talk is through syntax, this is leveraging one of the big lessons of homoiconic languages (lisp, racket, etc), without altering the AST in key areas, this isn't possible.

zerbina added 19 commits February 16, 2024 00:25

sigmatch: re-sem operands without the nfSem flag

f58ce0a

Instead of looking for typed expression, `prepareOperand` looks for already analyzed (as indicated by the `nfSem` flag) operands when deciding whether to analyze the expression again or not.

debugging: display node flags in tracing output

55862cd

With the `nfSem` becoming much more important, it makes sense to display node flags in tracing output.

semcall: restore borrow target search

a1ba6d1

Borrow target searching runs overload resolution with typed `nkEmpty` nodes as the parameter. This worked when types disabled re-semming, but it no longer does. To disable analysis of the nodes, they're marked with `nfSem`.

semexprs: prevent re-sem of hidden derefs

8c15fd5

sigmatch: prevent re-typing of hidden nodes

19fc011

Hidden nodes produced by sigmatch are all marked with the `nfSem` flag now, in order to prevent them from being discarded by again by re- typing (triggered by `semFinishOperands`).

semexprs: make set constructor analysis roundtrip

9b1ec67

Rework `semSetConstr` such that it: * doesn't mutate the input AST * doesn't use `localReport * analyzes `nkRange` nodes * propagates errors

semexprs: support re-sem'ing of literal AST

2d32c66

`semExpr` had no support for `nkNimNodeLit`erals, meaning that their appearance in macro/template was always led to an error.

semexprs: don't perform unnecessary tree copy

1d349f2

Don't make a full copy of the left-hand operand when building an `nkDotCall` -- the tree was analyzed immediately before. With the tree copy, the AST was treated as requiring analysis again, which is unnecessarily wasteful.

sigmatch: don't update empty container types in-place

fe4733f

It doesn't work correctly for statement list expressions, and the `copyTree` also loses the `nfSem` flag, leading to `semFinishOperands` sem'ing the argument again and assigning it the wrong type.

sigmatch: prevent static arguments from losing their type

cecae1c

The constant expressions produced by `paramTypesMatchAux` are marked as `nfSem`, so that `semFinishOperands` doesn't re-type them (which would lose the `tyStatic` type).

semexprs: work around sequence constructions

fb55c41

A call to `semArrayConstr` would type a constant sequence construction (`nkBracket`) as an `array`, which is wrong. For now, a constant sequence construction is kept as is during re-typing.

sigmatch: don't re-sem default arguments

fc21905

sigmatch: mark constant-evaluated expression with nfSem

155ba45

This prevents it from being re-typed (which could lose its type) when substituting the `static` parameter with it.

semcall: don't re-sem default values (2)

2fa828b

Somehow the AST taken directly from the parameter symbol is not marked with `nfSem`. This needs some further investigation.

zerbina added bug Something isn't working compiler/sem Related to semantic-analysis system of the compiler labels Feb 16, 2024

zerbina added this to the Semantic analysis phase refactoring milestone Feb 16, 2024

zerbina marked this pull request as draft February 16, 2024 02:23

saem reviewed Feb 16, 2024

View reviewed changes

saem mentioned this pull request Feb 27, 2024

confusion of forvar type when supplied as typed macro argument #1211

Open

zerbina mentioned this pull request Apr 23, 2024

sem: proper error propagation for set construction #1283

Merged

zerbina mentioned this pull request Jun 8, 2024

sem: keep shape of array constructors in typed AST #1340

Merged

Merge branch 'devel' into sem-resem-operands

3cbc842

zerbina mentioned this pull request Aug 25, 2024

sem: fix type inference for static parameters #1433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sem: re-sem all call arguments in macro output #1192

sem: re-sem all call arguments in macro output #1192

zerbina commented Feb 16, 2024 •

edited

Loading

saem Feb 16, 2024

Varriount commented Mar 6, 2024

zerbina commented Mar 6, 2024

saem commented Mar 6, 2024

		# XXX This is unsound! 'formal' can differ from overloaded routine to
		# overloaded routine!

sem: re-sem all call arguments in macro output #1192

Are you sure you want to change the base?

sem: re-sem all call arguments in macro output #1192

Conversation

zerbina commented Feb 16, 2024 • edited Loading

To-Do

Notes for Reviewers

saem Feb 16, 2024

Choose a reason for hiding this comment

Varriount commented Mar 6, 2024

zerbina commented Mar 6, 2024

saem commented Mar 6, 2024

zerbina commented Feb 16, 2024 •

edited

Loading