Skip to content

Commit

Permalink
⚡ [#4744] Optimize formio data access
Browse files Browse the repository at this point in the history
Calling out to glom too often hurts performance, and before this patch
we did so to check for key membership AND access the value for a given
key, with no caching in between. This also adds that if a key doesn't
exist, we'd end up in exception handling, which is notably slow in
Python.

This patch applies an optimization for the simple cases in a couple of
ways:

* we store which (nested) keys exist. This happens by setting the key
  in an internal set whenever a write operation to that key happens. If
  the FormioData is initialized with a data structure, the UserDict
  internals cause the __setitem__ method to be called, which ensures
  the keys is added to the set.
* if the key doesn't contain the '.' character, we know that it must
  be a top-level/root key, so we can shortcut the membership test by
  checking for membership in 'self._keys' rather than hitting the
  glom code path
* in __getitem__, we can also treat simple keys as a simple case and
  directly access self.data, rather than going through glom for this

Note that this patch doesn't handle deleting keys - __delitem__ is not
implement, so the key removal doesn't either. I also suspect there
may be some weird cases with 'data.update(...)' variants where nested
keys are changed and stale data may be left in the _keys membership,
so that's why I'm erring on the side of caution and only consider
root-level keys for the shortcut code path.

This patch can definitely be improved upon, but it must also easily be
backported and a larger refactor can be done in a separate PR.

Backport-of: #4768
  • Loading branch information
sergei-maertens committed Oct 22, 2024
1 parent bd44c13 commit 6ed8a2d
Show file tree
Hide file tree
Showing 3 changed files with 64 additions and 9 deletions.
42 changes: 35 additions & 7 deletions src/openforms/formio/datastructures.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import re
from collections import UserDict
from collections.abc import Hashable
from typing import Iterator, cast

from glom import PathAccessError, assign, glom
Expand Down Expand Up @@ -162,16 +161,45 @@ class FormioData(UserDict):
"""

data: dict[str, JSONValue]
_keys: set[str]
"""
A collection of flattened key names, for quicker __contains__ access
"""

def __getitem__(self, key: Hashable):
return cast(JSONValue, glom(self.data, key))
def __init__(self, *args, **kwargs):
self._keys = set()
super().__init__(*args, **kwargs)

def __setitem__(self, key: Hashable, value: JSONValue):
def __getitem__(self, key: str):
if "." not in key:
return self.data[key]
try:
return cast(JSONValue, glom(self.data, key))
except PathAccessError as exc:
raise KeyError(f"Key '{key}' is not present in the data") from exc

def __setitem__(self, key: str, value: JSONValue):
assign(self.data, key, value, missing=dict)
self._keys.add(key)

def __contains__(self, key: object) -> bool:
"""
Check if the key is present in the data container.
This gets called via ``formio_data.get(...)`` to check if the default needs to
be returned or not. Keys are expected to be strings taken from ``variable.key``
fields.
"""
if not isinstance(key, str):
raise TypeError("Only string keys are supported")

# for direct keys, we can optimize access and bypass glom + its exception
# throwing.
if "." not in key:
return key in self._keys

def __contains__(self, key: Hashable) -> bool:
try:
self[key]
except PathAccessError:
return True
except KeyError:
return False
return True
28 changes: 28 additions & 0 deletions src/openforms/formio/tests/test_datastructures.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,34 @@ def test_initializing_with_dotted_paths_expands(self):

self.assertEqual(formio_data, expected)

def test_key_access_must_be_string(self):
formio_data = FormioData({"foo": "bar"})

bad_keys = (
3,
None,
4.3,
("foo",),
)

for key in bad_keys:
with self.assertRaises(TypeError):
key in formio_data # type: ignore

def test_keyerror_for_absent_keys(self):
formio_data = FormioData({})
bad_keys = (
"foo",
"bar.baz",
)

for key in bad_keys:
with (
self.subTest(key=key),
self.assertRaises(KeyError),
):
formio_data[key]


class FormioConfigurationWrapperTests(TestCase):

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
from django.db.models import F

import glom
from glom import PathAccessError

from openforms.authentication.service import AuthAttribute
from openforms.contrib.objects_api.clients import (
Expand Down Expand Up @@ -542,7 +541,7 @@ def get_record_data(
for key, variable in state.variables.items():
try:
submission_value = dynamic_values[key]
except PathAccessError:
except KeyError:
continue

# special casing documents - we transform the formio file upload data into
Expand Down

0 comments on commit 6ed8a2d

Please sign in to comment.