diff --git a/docs/rfc/glossary.md b/docs/rfc/glossary.md new file mode 100644 index 000000000..1fc27c9cd --- /dev/null +++ b/docs/rfc/glossary.md @@ -0,0 +1,13 @@ +# Glossary + +## Mapping + +A mapping is an event handler. It is a typescript function triggered by Hydra processor when the event is being processed. + +## Mapping Script + +Mapping script is the module where all the mapping functions are exported. Typically, it's located in `mappings/index.ts` and contain only exports of the mappings. + +## Virtual event + +Hydra fetches runtime [events](https://substrate.dev/docs/en/knowledgebase/runtime/events) and [extrinsics](https://substrate.dev/docs/en/knowledgebase/runtime/execution#executing-extrinsics) from the chain and places them into a processing queue as emitted by the chain. The mappings (i.e. event/extrinsic handlers) may emit additional events internal to Hydra. Since these events don't come from the Substrate runtime, they are called virtual. \ No newline at end of file diff --git a/docs/rfc/manifest-file.md b/docs/rfc/manifest-file.md new file mode 100644 index 000000000..ddd8698c0 --- /dev/null +++ b/docs/rfc/manifest-file.md @@ -0,0 +1,94 @@ +# Manifest file for Hydra processor + +## Summary + +The Manifest file is a high-level `.yml` file which describes _how_ and _what_ the Hydra processor has to run. The manifest file + the mappings should the sufficient for running the processor from a clean state. It should replace the root `.env` presently used as a config file. + +## Goals and motivation + +1) We need a more structured and clear way to configure the processor + +2) As the complexity of the processor grows, a descriptive config with nested properties is required + +3) Smoother transition for The Graph users + +## Urgency + +As we add more features to the processor, we need a unified high-level definition of how the processor should apply the handlers. + +## Detailed Design + +### 1.1 Top-Level API + +Similar to TheGraph design + +| Field | Type | Description | +| --- | --- | --- | +| **specVersion** | *String* | A Semver version indicating which version of this API is being used.| +| **schema** | [*Schema*](#12-schema) | The GraphQL schema of this subgraph.| +| **description** | *String* | An optional description of the subgraph's purpose. | +| **repository** | *String* | An optional link to where the subgraph lives. | +| **dataSource**| [*Data Source Spec*](#13-data-source)| Each data source spec defines the data that will be ingested as well as the transformation logic to derive the state of the subgraph's entities based on the source data.| + +### 1.2 Schema + +| Field | Type | Description | +| --- | --- | --- | +| **file**| [*Path*](#16-path) | The path of the GraphQL IDL file (no IPFS support for now) | + +### 1.3 Data Source + +| Field | Type | Description | +| --- | --- | --- | +| **kind** | *String* | The type of data source. Possible values: *substrate/runtime* (vs *ethereum/contract* for TheGraph).| +| **name** | *String* | The name of the source data. Will be used to generate APIs in the mapping and also for self-documentation purposes. | +| **network** | *String* | For blockchains, this describes which network the subgraph targets. For example, "kusama" or "joystream/babylon". | +| **source** | [*HydraIndexerSource*](#131-hydraindexersource) | The source data for ingestion. | +| **mapping** | [*Mapping*](#132-mapping) | The transformation logic applied to the data prior to being indexed. | + +#### 1.3.1 Mapping + +##### 1.3.1.1 Substrate Mapping + +| Field | Type | Description | +| --- | --- | --- | +| **kind** | *String* | Must be "substrate/events" for Substrate Events Mapping. | +| **apiVersion** | *String* | Semver string of the version of the Mappings API that will be used by the mapping script. | +| **language** | *String* | The language of the runtime for the Mapping API. For now only *typescript*. | +| **types** | *String* path to a file with definitions | A typescript file exporting the types and interfaces satisfying the event and extrinsic signatures in the handler definitions. Typically, the file reexports the standard polkadot types together with custom files | +| **virtualEventHandlers** | optional *EventHandler* | Handlers for specific virtual events, which must be exported in the mapping script. | +| **eventHandlers** | optional *EventHandler* | Handlers for specific virtual events, which must be exported in the mapping script. | +| **extrinsicHandlers** | optional *ExtrinsicHandler* | A list of functions that will trigger a handler on `system.ExtrinsicSuccess` event with the extrinsic data. | +| **blockHandlers** | optional *BlockHandler* | Defines block filters and handlers to process matching blocks. | +| **file** | [*Path*] | The path of the mapping script exporting the mapping functions. | + +#### 1.3.1.2 EventHandler + +| Field | Type | Description | +| --- | --- | --- | +| **event** | *String* | An identifier for an event that will be handled in the mapping script. It must be in the form `.(type1,type2,...,)` as defined in the metadata file. For example, `balances.DustLost(AccountId,Balance)`. The declared types and interfaces should be importable from `./generated/types.ts`. | +| **extrinsic** | optional *String* | The extrinsic that caused the event. If present, only events emitted by the specified extrinsics will be handled by the handler. Must have a fully qualified name in the form +`
.(type1,type2,...,)`| +| **handler** | *String* | The name of an exported function in the mapping script that should handle the specified event. | + +#### 1.3.1.3 ExtrinsicHandler + +| Field | Type | Description | +| --- | --- | --- | +| **extrinsic** | *String* | An identifier for a function that will be handled in the mapping script in the form `
.(name1: type1, name2: type2,...,)`. Example: `utility.batch(calls: Vec)`| +| **filter** | optional *String* | An open_CRUD string specifying additional filtering to be applied, i.e. `calls[0].args.max_additional_gt > 10000`. The detailed syntax to be documented elsewhere and is subject to change. | +| **handler** | *String* | The name of an exported function in the mapping script that should handle the specified event. | +| **emits** | list of *String* | A list of virtual events the handler _may_ emit | +| **exports** | list of *String* | Extra types exported by the handler | + +#### 1.3.1.4 BlockHandler + +| Field | Type | Description | +| --- | --- | --- | +| **onInitialize** | optional *String* | The name of an exported function in the mapping script that is called before any events in the blocks are processed | +| **onFinalize** | optional *String* | The name of an exported function that is called when all the events in the block are processed | +| **filter** | optional *String* | The name of the filter that will be applied to decide on which blocks will trigger the mapping. If none is supplied, the handler will be called on every block. The detailed syntax to be documented elsewhere and is subject to change. | + +## Compatibility + +The changes are not compatible with Hydra v0. diff --git a/docs/rfc/review.md b/docs/rfc/review.md new file mode 100644 index 000000000..47bcbc5b9 --- /dev/null +++ b/docs/rfc/review.md @@ -0,0 +1,117 @@ +# Layout for the manifest file + +- What part of this has to be created by hand vs. created via tooling? In particular all the information about the handlers. + +The file layout is going to look like this: + +``` +generated +|_graphql-server +|_processor +|_types +|___balance.ts // generated type event and extrinsic classes named after each module +|___transfer.ts +|___types.ts // auto-generated interfaces for the events and extrinsic types. More on that below. +mappings +|_index.ts +|_file-with-handlers1.ts +|_file-with-handlers2.ts +|_some-helper-functions.ts +schema.graphql +manifest.yaml +runtime-metadata.json // this file should be manualy extracted from the runtime definition +``` + +Evetything under `generated` folder is, well, generated. Everything else is written by hand. + +- Why does the signature information need to be specified in the handlers? As long as you have a fully specified name (module + module level name), then there can be only one event or extrinsic, as there is no overloading. The full signature should then be available from some external resource (the chain metadata file?) This part seems like it can get really rough to keep in synch with chain if 100% manual. + +That's indeed not strictly necessary but I thought it would helpful to have as an additional sanity check. I don't think it's too much of a burden as it can be quickly checked against `runtime-metadata.json`. Provided the error is informative, it's easy to fix. At the same time, it ensures that the mapping developer is "in synch" with the chain. + +- The notion of a `mapping script`, `Mappings API` and `virtual events` are not really defined anywhere, despite being referenced quite a bit. I think some sort of glossary or defintion secton would be a benefit. + +See the [glossary](./glossary.md) + +- If `EventHandler.virtual` is true, then it appears the described format for `EventHandler.event`. Perhaps there needs to be a distinct `VirtualEventHandler`? + +I think it's a good idea to move it into a separate section in the manifest file, namely `virtualHandlers`. The manifest spec is updated. + +- If I care about what extrinsic is causing my event, by using the `EventHandler.extrinsic`, then don't I almost certainly need to look at its parameter values as well in my event mapper? In which case isn't this manifest level filtering going to be redundant pre-filtering? + +Not 100% sure I understand the question. The main use-case for filtering out extrinsics is when for some reason there is no convenient event to handle in the first place. + +One such example is RMRK, based on the `system.remark` extrinsics (like [this](https://kusama.subscan.io/extrinsic/0xcc86342331516adeea5bbe5c83c9d7689d992d10f598500e5bf66edea0c2dbcb)). There handler filter would take only `system.remark` with the `_remark` value starting with `0x726d72`. One can of course process all the remarks and do the filtering inside the mapping, but that's much less efficient. + +Another example would be the situation we had with `contentDirecory.transaction` not emitting `transactionFailed`, which would allow to recover the processor database state. Similarly, if some post-processing for (partially) failed `batch` extrinsics are required, we'd need to filter out those that are relevant. Again this can be circumvented by moving the logic into the extrinsic mapping, at the cost of performance overheads. + +- My assumption is that if I want to get a Hydra instance going for a given runtime, where the mappings and manifest has been prepared by someone else, then I should not really edit that file. The manifest, as I have understood it, be about coordinating the inherent components of a Subgraph, not about a particular deployment of it. However, this format does not follow that principle, as I need to edit it to change the `Hydra Indexer Source`. If my premise is correct, then not only should the endpoint not be in the manifest, but even the blocks possibly should not, as a particular manifest is really just locked to a runtime or runtime module, which itself could exist in different chains in different block intervals. Ultimately, it depends on what we see the role of the manifest being. + +You're probably right. Looking at how we actually deploy the processor, it's more convenient to provide DB settings, indexer URL, start/end blocks via env variables. + +- `dataSources`, along with its description, indicates multiple sources, but the type does not, should the type perhaps be `[DataSource]` or something, signalling a vector? + +This is copied from TheGraph manifest. After some thought I can't really imagine how multiple-source processor may work, even in principle. The fundamental issue is that with multiple sources there is no canonical ordering of the events. + +- Why is `entities` actually needed in `Mapping`, is there some benefit to respecifying it here, as the dev will need to keep it in synch with the input file. + +Same as with the events signature. It is present in TheGraph manifest, so I initially retained for compatibility. Here I don't think it really adds any value though, so I removed from the manifest as you suggest. + +- The `filter` property seems to me to be a risky proposition with unclear upside. By splitting the rule for what extrinsic invocations actually impact processor state across the mapping code and the manifest file, it seems its easy to lose track of what the effective rule is over time. + +As discussed above, the main use-case for having a filter is process extrinsics that don't emit good events to handle. In the cases I can come up with these extrinsics are generic or proxy calls (remark, batch, mutlisig, etc). If there is no filtering in the manifest file, I believe it simply leak into the mapping itself, which indeed would make it hard to keep track of how the data is handled. + +In general, my reasoning about the manifest file is a follows. It describes _what_ should be passed to the handlers and _when_. The +what part is reflected in the implicit filtering by the event name and the explicit filtering by the extrinsics filter (if provided). The _what_ part also includes the data transformation part powered by the metadata and the generated types, but it is not relevant in the current discussion. + +The _when_ part is defined solely by the order in which the events are emmitted by the runtime (including `system.ExtrinsicSuccess` which indicates that the extrinsic handler should be run). Indeed the expected behaviour for the extrinsic handler is to be a pure function which never updates the state but only emits virtual events which in turn processed by the (virtual) event handlers. + +- What is the purpose of `ExtrinsicHandler.exports`? + +This is probably needed for codegen purposes. The use-defined types and interfaces used by virtual events can be then imported and re-exported in `./generated/types.ts` similar to the "native" substrate types. If it turns out that there's a better approach, we may drop this property. + +- `BlockHandler` probably needs to have an option for whether it runs before or after all other mappings, as this would correspond to `on_initialize` and `on_finalize` in the runtime. The `handler` property description referenes an event, I assume this is a copy&paste error. + +Agree, fixed. + +# Proposal for generating type-safe mappings for events + +- I seem to recall that some feature called `typegen` needed to be part of this to makethings properly automatic, this is what Leszek said, is that still the case here? + +I have read Leszek's [thoughts](https://github.com/Joystream/joystream/issues/1816) on how the query node should generate and validate the types, in particular in the context of EVM calls. It is not fully clear to me at the moment, so probably we should get in sync together first. + +- This appears to not allow extrinsic only type safe mappings? Why is that? I suspect this to be used 99% of the time, as our current event based approach degenerates into working back into the extrinsic payload. The only reason to process an event is if the side-effect is triggered by an inherent (on_finalized, on_initialized) or it's triggered by multiple extrinsics, and you would like to avoid dealing with them separately, AND (for both cases) the event has sufficient information to compute the full side-effect. This latter constraint is almost never satisfied in our runtime. + +- Once we have transactional handlers, I cannot think of a single use case off the top of my head where we want to lock in pairs of extrinsics and events. I also think the example would get really messy with you had lots of pairs like this, you would have `Mod1Extrinsc2BalancesBalancesSetEvent` or something like that to capture combos of different extrinsics causing balance setting. If we do not have this, the example just boils down to what The Graph would have, which would be a single type `Balances.BalanceSetEvent` which would have the layout of current `BalanceSet__Params`. + +- I did not understand when we would end up in a situation like `BalanceSet__Params`, where there are no names for params, doesn't the metadata file have names for all extrinsics and events? Seems so https://whisperd.tech/post/substrate_metadata/. + +- In the example, the module name for an event is prefixing the name of the type extending `SubstrateEvent`, e.g. `BalancesBalanceSetEvent`. Could we use proper typescript scoping/namespacing to increase readability here, so it would be `Balances.BalanceSetEvent`? + +# Virtual events + +- I suggest the "Goals and motivations" section only discussed the problem/limitations being addressed in an approach without virtual events. Right now its simultanously describing the problem & solution. A high level description introducing the virtual events high levle "solution" to the stated problems deserves to live in it's own section perhaps. + +- Isn't just having transaction handlers an equally suitable solution to the first point in the motivations? + +As I described above, my intuition here is that the transaction handlers should be pure functions responsible for either dispatching the virtual events (e.g. after a batch call or EVM modules `Log` event) and/or filtering and/or transforming the incoming data. After some further thought, it may be reasonable to even move some of this logic to the indexer, so that the processor can source virtual events directly from the indexer. + +- In the context of EVM support, I sort of interpret these virtual events as our current pre-handlers, with the main difference being that pre-handles directly call what method, while this is lifted out into the manifest file here. Is that correct? I presume the main value-add of this is that you can just grab someone elses virtual event generator off-the shelf, without needing to mess with changing what they call. If this is correct, then it would seem that perhaps virtual event generators should be a distinct kind of extrinsicshandler which is pure, i.e. it does not access the database. Because you really dont want to have to audit exactly what some off-the shelf pre-handler does or does not do. + +Yes indeed! A few remarks: as I mentioned, it may make sense to actually make the _indexer_ expose virtrual events in the same way the runtime events are exposed. The main use-case is of course the EVM module, and this would allow the processor clients to tap into the events directly without the need to have a handler on the processor side. + +- I don't actually entirely see a smooth way of getting virtual event generators from the outside world and integrating them into my setup. It would seem to require quite a lot of manual stitching, copying, editing of manifest files, etc. +If the virtual event generators change, then you will probably need to do a bit of brittle housekeeping? I assume this can be made smoother. + +- If the prior point is correct, then the main new value unlocked is presumably the EVM handling, but for this specific problem I think it seems we are some ways off from the full experience. Lets say I want to write a query node for some new smart contracts. With the virtual events approach it seems I would need to sit down and dig into how to handle extrinsics to the EVM pallet, which is very complex, and I would have to hand-craft types that mirror the different parameters and extrinsics I care about. Once I had all this, I could write my actual handlers for these. At this point, if someone wanted to write another query node for the same smart contract, they could get my event emitters, and map events differently depending on what queries they wanted to support. Given that its often going to be one canonical query set of interest to a given system, the value of the reusability unlocked is relatively low. What I really want to be able to do is what we are thinking I can do in the type safe Substrate case. I want to provide the ABI of my contracts to the codegen tool, and I want to tell it that I care about methods x,y,z, and it gen generate all the required types I want so that I can just jump into writing handlers. In principle the codegen tool could generate virtual event generators, but that is only a technicality at that point. + +From the general pov, virtual events are designed for the situation when an extrinsic _should_ emitted a runtime event but doesn't do it for some reason, or does but in an awkward format. When it comes to the EVM handling, there are two main issue: + +1) How to make the handlers type-safe (as opposed to consiming `SubstrateEvent` or `EthereumEvent`) + +2) How to unwrap the Substrate `Log` event into an EVM event and process in the correct order + +The virtual events part is more on how to deal with 2). The idea is that by emitting virtual events in the right format one could make the existing Subgraphs working with Hydra out of the box. The processor will trigger the handlers in the right order. + +Again, decoding the EVM log events at the indexer side seems like an alternative/complementary solution, with a clear value for external users not necessary willing to onboard the processing part of Hydra + performace boost due to filtering. There might be a more elegant way to do the "technicality", I am open for suggestions here. + +As for the for the former, the question is to how to organize the generated code efficiently and user-friendly on one side, and not overly complex on the other. diff --git a/docs/rfc/type-safe-mappings.md b/docs/rfc/type-safe-mappings.md new file mode 100644 index 000000000..b52613275 --- /dev/null +++ b/docs/rfc/type-safe-mappings.md @@ -0,0 +1,91 @@ +# Strongly typed mappings + +## Summary + +The mappings define how the processor should handle the events. This proposal describes how the event can be auto-generated to make it strongly typed. + +## Goals and motivation + +1) Manual type conversion from `SubstrateEvent` currently passed to the mappings is error-prone + +2) Runtime changes and deserialization issues are easy to spot + +## Urgency + +Subjectively, strong typings is a must if a large collection of mappings is going to be maintained for a long time. + +## Detailed Design + +The codegen will perform the following steps: + +1) Define all (event, extrinsic) pairs that are going to be handled according to the manifest file. Extrinsic may be optional +2) Lookup the definitions and argument types in the metadata file, both for the event and the extrinsic. If the extrinsic was not specified, leave it as optional arbitrary `AnyJSON`. +3) Generate classes extending `SubstrateEvent` with `get` methods performing the deserialization and type-casting. The classes are created using [TypeRegistry](https://github.com/polkadot-js/api/blob/master/packages/types/src/types/registry.ts) interface from `@polkadot/api`. By default, it contains all the base substrate type definitions but my be extended with custom types. This assumes that the required types can be created by name via `typeRegistry.createType(type, value)`. `typeRegistry` will be declared in `@dzlzv/hydra-common` and must be defined and exported in the processor runner. +4) The types and interfaces must be available for import from `./types.ts`. The standard polkadot types should be re-exported together with additional custom and manually defined types and interfaces. +5) The property names will be derived using the `name` property (if present), otherwise by lowercasing the type. If there are several properties of the same type, prefix with a number. + +Example of the strongly typed event and the autogenerated properties: + +```typescript +// typeRegistry is declared there but must be defined elsewhere +import { typeRegistry, SubstrateEvent } from '@dzlzv/hydra-common' +// types.ts must reexport these types from @polkadot/api/intefaces +import { AccountId, Balance, Compact } from './types' + +namespace Balances { + export class BalanceSetEvent extends SubstrateEvent { + + get params(): BalanceSet__Params { + return new BalanceSet__Params(this) + } + + get callArgs(): BalanceSet__CallArgs { + return new BalanceSet__CallArgs(this) + } + } + + class BalanceSet__Params { + _event: BalanceSetEvent; + + constructor(event: BalanceSetEvent) { + this._event = event; + } + + get accountId(): AccountId { + return typeRegistry.createType('AccoundId', this._event.params[0]) + } + + get balance0(): Balance { + return typeRegistry.createType('Balance', this._event.params[1]) + } + + get balance1(): Balance { + return typeRegistry.createType('Balance', this._event.params[2]) + } + } + + class BalanceSet__CallArgs { + _event: BalanceSetEvent; + + constructor(event: BalanceSetEvent) { + this._event = event; + if (this._event.extrinsic === undefined) { + throw new Error(`Extrinsic balances_set is expected`) + } + } + + get who(): AccountId { + return typeRegistry.createType('LookupSource', this._event.extrinsic?.args[0]) + } + + get new_free(): Compact { + return typeRegistry.createType('Compact', this._event.extrinsic?.args[1]) + } + + get new_reserved(): Compact { + return typeRegistry.createType('Compact', this._event.extrinsic?.args[2]) + } + } +} + +``` diff --git a/docs/rfc/virtual-events.md b/docs/rfc/virtual-events.md new file mode 100644 index 000000000..d53ba9873 --- /dev/null +++ b/docs/rfc/virtual-events.md @@ -0,0 +1,105 @@ +# Virtual events + +## Summary + +Virtual events are similar to chain events but emitted by the processor itself. This allows to significantly extend the scope and potential use-cases for the mappings. + +## Goals and motivation + +There are multiple cases where relying only on the data provided by substrate events is not feasible. + +1) The extrinsic does not throw a relevant event to capture the data, e.g. during a `sudoAs` or `batchCall`. +Example: The project RMRK relies on `system.remark` calls. There is no specific event that'd efficiently capture the required data. + +2) The event data requires additional transformation before it can be processed. For example, EVM runtime module would only emit `Log` events with low-level data. An intermediary extrinsic handler would listen to such low-level extrinsic calls, decode the data end emit virtual events with strongly typed data. The goal here is to make seamless migration of the graph mappings used for Ethereum to smart contracts deployed on Moonbeam or EVM palette. + +3) Allow reusable extrinsic handlers and events imported from external public libraries. + +4) At the first step, only extrinsicHandlers are allowed to emit virtual events. A logical extension would be to allow event handlers to emit +virtual events as well, thus making it possible to have complex pipelines of arbitrary complexity. + +## Urgency + +Unless no other alternatives are suggested, this is needed to go forward with mappings for EVM smart contracts. + +## Detailed design + +Virtual events are placed in the processing queue straight after the `ExtrinsicSuccess` event and before the next chain event. + +For now, virtual events can be emitted only by extrinsic handlers. The virtual events emitted by the extrinsic handlers should be explicitly defined in the manifest file: + +```yml +... +eventHandlers: + - event: remarkCreated(RemarkData) + handler: handleRemarkCreated + virtual: true +extrinsicHandlers: + - extrinsic: system.remark(remark: Bytes) + handler: handleRemarks + emits: + - remarkCreated(RemarkData) + exports: + - RemarkData +``` + +The handler will like this + +```typescript + +// mappings/handleRemarkCreated.ts +import { RemarkData } from './handleRemarkCreated' + +export handleRemarkCreated(remarkData: RemarkData) { + // some business logic with remarkData +} + +// mappings/handleRemarks.ts + +import { emit, EventData } from '@dzlzv/hydra-processor' +import { RemarkExtrinsic } from './generated/RemarkExtrinsic' + +// some custom interface to be emitted along with the data +export interface RemarkData extends EventData { + name: String; + url: String; +} + +export handleBatchCalls(extrinsic: RemarkExtrinsic) { + const data: Bytes = extrinsic.args._remark + // parse and transform the raw data + const remarkData = parse(...) + emit('remarkCreated', remarkData) +} + +/// Auto-generated in ./generared/RemarkExtrinsic + +// typeRegistry is declared there but must be defined elsewhere +import { typeRegistry, SubtrateExtrinsic } from '@dzlzv/hydra-common' +// types.ts must reexport these types from @polkadot/api/intefaces +import { Bytes } from '../types' + +export class RemarkExtrinsic extends SubstrateExtrinsic { + get args(): RemarkExtrinsic__Args { + return new RemarkExtrinsic__Args(this) + } +} + +export class RemarkExtrinsic__Args { + _extrinsic: SubstrateExtrinsic; + + constructor(extrinsic: SubstrateExtrinsic) { + this._extrinsic = extrinsic; + } + + get _remark(): Bytes { + return typeRegistry.createType('Bytes', this._extrinsic.args[0].value) + } + +} + +``` + +## Compatibility + +Not compatible with the previous Hydra versions.