Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use slots in common classes #1434

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

moradology
Copy link
Contributor

@moradology moradology commented Oct 9, 2024

Related Issue(s):

Description:
Because it isn't unexpected that even tens of thousands of Item and Link classes and a fortiori StacObject classes are regularly in memory, there should be some small but appreciable performance improvements from using __slots__ rather than __dict__ to hold class attributes. The downside of this change is that new attributes cannot be set at runtime that aren't expected in __slots__ definitions. In general, the existence of extra_fields on Item classes makes that a non-issue for the cases we care about.

Beyond simply adding __slots__ definitions, this PR also updates the serialization logic to properly use them and adds some __init__ logic, where appropriate, to avoid non-initialization of expected __slots__ values.

#1207 is a related issue but it should be noted that this PR does not fully address the performance issues indicated there, as many duplicated href lookups due to eager mutation of objects remain an issue. IF spilling memory to disk is the cause of the observed exponential increase in runtimes during .save is to blame, this PR should provide a bit more breathing room.

PR Checklist:

  • Pre-commit hooks and tests pass (run scripts/test)
  • Documentation has been updated to reflect changes, if applicable
  • This PR maintains or improves overall codebase code coverage.
  • Changes are added to the CHANGELOG. See the docs for information about adding to the changelog.

Copy link

codecov bot commented Oct 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.22%. Comparing base (f23af86) to head (880b6e9).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1434      +/-   ##
==========================================
+ Coverage   91.21%   91.22%   +0.01%     
==========================================
  Files          52       52              
  Lines        7237     7248      +11     
  Branches     1018     1021       +3     
==========================================
+ Hits         6601     6612      +11     
  Misses        457      457              
  Partials      179      179              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@moradology
Copy link
Contributor Author

@gadomski This is ready for review. Some of the changes necessary to get slots working are a bit subtle (happy to explain anything that looks unnecessary), but the surface area of changes is relatively small.

@gadomski gadomski self-requested a review October 10, 2024 00:15
Copy link
Member

@gadomski gadomski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for this ... I ran the benchmark suite and while most stuff was unchanged, there is a (pretty significant) speedup when walking a large-ish catalog:

$ asv compare main feature/use-slots --only-changed
| Change   | Before [f23af862] <main>   | After [880b6e94] <feature/use-slots>   |   Ratio | Benchmark (Parameter)                                                                 |
|----------|----------------------------|----------------------------------------|---------|---------------------------------------------------------------------------------------|
| -        | 78.2±0.7μs                 | 52.2±0.8μs                             |    0.67 | catalog.WalkCatalogBench.time_walk [new-terrain/virtualenv-py3.12]        |
| -        | 78.4±0.6μs                 | 50.5±1μs                               |    0.64 | catalog.WalkCatalogBench.time_walk [new-terrain/virtualenv-py3.12-orjson] |

Couple of questions:

  1. Do you reckon that this is a breaking change? My understanding of __slots__ is not deep, but the inability to set non-slot attributes at runtime does feel impactful.
  2. Is there a benchmark that we could cook up to test Collections with many items saving time issue #1207? It'd be nice to give ourselves a target to aim for.

@gadomski gadomski linked an issue Oct 10, 2024 that may be closed by this pull request
@moradology
Copy link
Contributor Author

  1. I fear this is most honestly considered a breaking change, as you suggest. I doubt it's anything like common but I guess other libraries interacting with pystac might be adding attributes willy-nilly for god knows what reason. That said, I'm also of the opinion that there's some flexibility in making decisions like this so whether to hold off for the full stac1.1 support release isn't entirely obvious to me either. I guess I'd reluctantly say hold off if we expect a release with breaking changes to happen soon 😭
  2. I think that's the critical question. Til yesterday I had relatively little luck reproducing things locally so I ran an insanely large performance test with cProfile and I think the results are somewhat illuminating. I'll reproduce this comparison in the relevant issue, but right now it really looks like there are some obvious (if not easy) optimization targets.

Here are the top offenders in terms of cumulative time for smallish (1000 item) catalog saves:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      2/1    0.002    0.001    0.965    0.965 catalog.py:935(save)
     1002    0.001    0.000    0.699    0.001 stac_object.py:439(save_object)
     4005    0.007    0.000    0.554    0.000 link.py:382(to_dict)
     4007    0.008    0.000    0.535    0.000 link.py:164(get_href)
     5009    0.005    0.000    0.452    0.000 utils.py:219(make_relative_href)
    13027    0.014    0.000    0.446    0.000 <frozen posixpath>:397(abspath)
     1000    0.003    0.000    0.414    0.000 item.py:368(to_dict)
     1000    0.001    0.000    0.404    0.000 item.py:390(<listcomp>)
     5009    0.007    0.000    0.399    0.000 utils.py:162(_make_relative_href_url)
    13027    0.390    0.000    0.390    0.000 {built-in method posix.getcwd}
     5009    0.014    0.000    0.383    0.000 <frozen posixpath>:486(relpath)
    10024    0.007    0.000    0.174    0.000 stac_object.py:279(get_self_href)
    19044    0.009    0.000    0.172    0.000 stac_object.py:185(get_single_link)
    19044    0.003    0.000    0.161    0.000 {built-in method builtins.next}
    38088    0.160    0.000    0.160    0.000 stac_object.py:207(<genexpr>)

And here are the top offenders with 100x more items:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      2/1    0.498    0.249 2252.578 2252.578 catalog.py:935(save)
  1000024    0.991    0.000 2008.690    0.002 stac_object.py:279(get_self_href)
  1900044    1.341    0.000 2008.273    0.001 stac_object.py:185(get_single_link)
  1900044    0.379    0.000 2006.719    0.001 {built-in method builtins.next}
  3800088 2006.553    0.001 2006.553    0.001 stac_object.py:207(<genexpr>)
   100002    0.349    0.000 1692.138    0.017 stac_object.py:439(save_object)
   400005    1.098    0.000 1666.717    0.004 link.py:382(to_dict)
   400007    1.421    0.000 1663.745    0.004 link.py:164(get_href)
   100000    0.555    0.000 1165.005    0.012 item.py:368(to_dict)
   100000    0.154    0.000 1162.880    0.012 item.py:390(<listcomp>)
        2    0.000    0.000  504.066  252.033 catalog.py:641(to_dict)
        2    0.047    0.023  504.039  252.019 catalog.py:657(<listcomp>)
        1    0.000    0.000  503.808  503.808 collection.py:584(to_dict)
   200006    0.087    0.000  490.862    0.002 stac_object.py:265(self_href)

OK, but how do we write a neat little test that isn't completely ad hoc?... I'm not sure. Ad hoc might be the smart play here? So maybe we empirically determine how much we can shave off of this thing with quick/dirty changes on a testing branch and use the results from that plus whatever disgust it elicits in terms of readability and code smell to decide exactly where to aim

Sorry if those were both non-answers but in fairness those are two very hard questions

@gadomski
Copy link
Member

gadomski commented Oct 10, 2024

👍🏼 on breaking, I'll put this in the v2.0 milestone and hold off on merging until v1.11 v1.12 goes out. And w.r.t. ad-hoc benchmarks I think that's fine, and maybe even better than fine.

Sorry if those were both non-answers but in fairness those are two very hard questions

Stop it, you're making me blush 😊

@gadomski gadomski marked this pull request as draft October 10, 2024 11:03
@gadomski gadomski added this to the v2.0 milestone Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use __slots__ for attributes
2 participants