Skip to content

Requirements on a common data representation

Ewout Kramer edited this page May 3, 2024 · 14 revisions

As part of making the Validator, FhirPath & CQL engine useable and performant across POCO and ITypedElement-based data, we are investigating new/better ways of getting data into these engines. Currently, all data must be in ITypedElement form for the validator and FhirPath and in POCO form for the CQL engine. It is not immediately clear from the definitions of POCOs and ITypedElement which features are essential for the engines to function, so we will discuss these below, as input to a possible re-design. Note that neither ITypedElement not the POCOs actually support all these, hence we currently have sub-optimal (aka hacks) in place to make this work at all.

Requirement: Navigate down a tree

Why To get to all data in a resource, we need to be able to traverse the tree
How Via GetElementPairs() we can currently traverse down properties, containing either other complex data, list of data or (at the leaves) atomic .NET data types.
Used Everywhere, essential.
Remarks There are several ways to get the children currently, but all of them can be based on GetElementPairs().

Requirement: Navigate to the parent in the tree

Why When being passed an element, being able to find its parent.
How NEW - A Parent property on Base
Used To construct the Location of a node within the tree, to find nearest resource, to find containers to resolve internal references.
Remarks Keeping the Parent property useful and up to date is hard since we need to keep it updated under changes. This means that getters/setters need to maintain it, but also adding/removing from a list. We even may need the List itself to be a parent (to be able to derive an index for an element), which means the List type in the generated POCO needs to change.

Requirement: Convert values to Cql types

Why Comparisons and math should be done on the Cql types.
How NEW - Implement ICqlConvertible on FHIR primitives, FHIR.Quantity
Used To carry out math and comparisons in FhirPath (and in the future, maybe CQL).
Remarks Currently, this logic is duplicated: the POCO types have comparisons on FHIR Primitives, which is not used by the FhirPath engine. The logic is also present on the CQL Types, so it is duplicated. Preferably, the operators on the FHIR primitive should delegate to the CQL operators on the applicable types.

Requirement: Know the element name of a node

Why Sometimes, logic depends on the name of the node.
How A POCO does not know its name, but when listing the children, their names are listed with the actual children, so known at that point.
Used To generate a Location, to filter elements in summaries, for general debug purposes, to relate definitions to instances, etcetera.
Remarks The fact that a node itself does not know its name (nor position in the list) means we may have to derive it by looking back up at the parent and then finding ourselves within its children (where the name is known). This would be acceptable (but slow) if this is only required for diagnostic messages, which I think it is, but we need to confirm.

Requirement: Store incorrect data

Why It is important to be able to capture the parsed data as it was sent to us, even incorrect parts, to make sure we do not lose data and to reason about it.
How A POCO has limited flexibility to store incorrect data, although the FHIR primitives have an ObjectValue that captures the raw, unparsed input string. We can add specific resources and datatypes called DynamicResource and DynamicDataType that do not have fixed properties but use dictionaries. Of course, to participate in the ecosystem, they will have to implement all interfaces to meet the other requirements formulated here.
Used Roundtripping, reporting errors during validation, go "as far as we can" with incorrect input.
Remarks Instead of these new resource types, we might introduce IResource and IDataType and let that be implemented in our existing ElementNode (and add an ElementDataTypeNode) that would implement both ITypedElement and those new interfaces.

Requirement: Represent collection elements

Why The model has both elements and repeating elements, and these need to be distinguished and are best handled using the familiar .NET collections.
How Element properties must be lists
Used Serialization, navigation through the tree, indexing, cardinality validation, fhirpath map/select etc.
Remarks Experience with ITypedElement (which mimics the XML) shows that it is useful to keep lists of stuff as lists.

Requirement: Indicate null/no data

Why Need to pick a value to use when an element exists, but is not present in the representation / has no data
How Use null
Used Everywhere
Remarks We would now prefer to use null over an empty collection for repeating elements.

Requirement: Need to know the instance type in the (FHIR) model

Why Processing logic may depend on the type of (FHIR) data, especially on choice types
How Each node should carry a (string based) typename
Used Serialization (choice types), FhirPath ofType(), validation
Remarks These must be runtime types, so should not be abstract types (as found in the StructureDefinitions sometimes). The POCO's have a naming convention for backbone types, which we could stick to. Cql primitives may be named by their url. Based on current practice, names that are not canonicals should be considered FHIR types, so anything else is from another model (e.g. CDA, if that's every going to be applicable). Unclear what to use if deserialization cannot determine an actual type, but it is probably better to pick a sentinel name for it, rather than leave it as null.

Requirement: Locate the nearest parent resource

Why Find the container of an element.
How Navigate up in the tree and then check if a node is a Resource/IResource
Used Resolution of contained resources, %resource in FhirPath, summary generation
Remarks

Requirement: Resolve internal references

Why FHIR offers references between resources in the same resource/bundle
How Navigate up in the tree and then check if a node is a Resource/IResource. Special handling is needed for contained resources and Bundles.
Used Implement resolve(), implement Resolve() on a FHIR reference datatype.
Remarks It would be nice to have this functionality in the POCOs, now it is only present in the ScopedNode.

Requirement: Determine Equality

Why Need to know whether two instances are "the same"
How Since there are different notions of what it means to be "the same" we might need several implementations of IEqualityComparer<T>, which would probably need children, names, types etc to determine equality.
Used Comparisons in FhirPath, equality in set operators etc.
Remarks

Requirement: Deep copy

Why Need to make duplicates that can me modified independently
How The POCOs currently have functions for making deep copies through the IDeepCloneable interface. Might be done using IDictionary too, which would require less boilerplate in the POCOs
Used Snapshot generator, presumably user code.
Remarks

Requirement: Make annotations

Why Useful to add user-definable annotations to each node of a tree for processing or informative purposes.
How We currently have an interface IAnnotated and IAnnotatable.
Used User code, TypedElement stack
Remarks Unclear why IAnnotatable is not a derived interface from IAnnotated.

Requirement: Provide binding facilities

Why Some datatypes can be used in bindings, need a uniform way to extract the code from it.
How There is an ICoded<T> interface which may be useful
Used Validator, CQL
Remarks CQL actually requires every resources to be able to return its "code", which is often one of the coded element that classifies the resource. So this is different from being able to extract a code from a bindeable datatype. But maybe there is overlap.

Sketchpad

For FhirPath

  • Need to be able to navigate through the tree of elements
  • Need to be able to get the value of a node as CQL/System type
  • Need to know the element name of a node
  • Need to be able to identify lists, and enumerate the elements. Preferably performant access based on index.
  • Need to detect null/empty values
  • Need to know the type of data to implement as() and ofType() and check the root node's type.
  • Need to be able to refer to the %resource, %rootResource and %context
  • Need to be able to resolve contained resources and bundled resources by id, starting from %rootResource (or %context?)
  • Need to be able to convert data from FHIR Quantity types to System.Quantity
  • Might need to be able to obtain full reflection type info (to implement https://build.fhir.org/ig/HL7/FHIRPath/#reflection)
  • Might need equality and comparison operators on non-system types.
  • Might need general conversion operators from non-system types to other types.
  • Might need to be able to read annotations
  • Might need to know the location of the node for a trace() message.

For the Validator

  • Need to be able to navigate through the tree of elements
  • Need to be able to get to the value of a node as CQL/System type, although a serialized form is acceptable too
  • Need to know the element name of a node, although a suffixed ([x]) form is acceptable too
  • Need to be able to identify lists and enumerate the elements
  • Need to detect null/empty values
  • Need to know the type of data only when this is not known from the definition (e.g. at contained, at root or a choice type)
  • Need to be able to resolve an internal reference
  • Need to know the location (instance path) of an element for use in diagnostic messages
  • Need to know the definition path (including slice) of an element for use in diagnostic messages
  • Need to know that data is bindeable or orderable
  • Need to be able to convert data to FHIR code/coding/codeableconcept for use with the terminology service
  • Needs to represent the data as a string for debug purposes
  • Needs to be able to represent persistent, serializable values (for use in Fixed/Patterns)
  • Might need to be serializable to fhir
  • Might need to be able to set annotations

CQL Engine

  • Really depends on the POCO currently, not easy to switch to another abstraction since Linq Expressions and code generation all depend on POCO's being present. To replace this, we'd need to fall back to e.g. the dynamic runtime and generate code against a DynamicMetaObject. Possible, but ambitious.

Others

  • Need to know the nearest parent resource in MaskingNode
  • StructureDefinition information (or ISDSummary) for element model and serialization
  • Generic "resolve" function currently uses id, ContainedResources, BundledResources, lots of ScopedNode members.
  • ScopedNode is public, and there are dependencies on its methods in other public parts of the API, so ScopedNode (as a wrapper of ITypedElement) will be around for a while, whatever new representation we might choose.
  • Simplifier uses ITypedElement extensively, and FS as well (though it uses ISourceNode more) from what I understood, so using the validator and FhirPath with ITypedElement should remain possible. This is probably also true for a lot of other non-firely users.
  • Parsers need to store incorrect data, preferably enabling losless round-tripping.
  • Attribute validation?
  • Summary serialization needs "in summary", min cardinality/mandatory and "is modifier".
  • XML serialization needs the absolute order of an element.
  • Serialization needs to know that an element is a choice element.
Clone this wiki locally