Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problems with Optimal dependency #100

Open
amandasposito opened this issue Jun 26, 2019 · 18 comments · May be fixed by #124
Open

Performance problems with Optimal dependency #100

amandasposito opened this issue Jun 26, 2019 · 18 comments · May be fixed by #124

Comments

@amandasposito
Copy link

I ran a profiler on my app using the eprof and fprof to identify bottlenecks to trace the execution of all functions in the code and report the time consumed with each.

Looking at the results 25% of the time was spent on Optimal, which is a lib responsible to validating code.

So I set up a benchmark script using Benchee to test the original code with the Optimal dependency and another one that overrides it with a simpler implementation.

The result was about 10% faster over the whole request without the Optimal dependency.

https://hexdocs.pm/mix/1.8.1/Mix.Tasks.Profile.Eprof.html
https://hexdocs.pm/mix/1.8.1/Mix.Tasks.Profile.Fprof.html

Operating System: Linux
CPU Information: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
Number of Available Cores: 3
Available memory: 5.82 GB
Elixir 1.8.1
Erlang 21.3.7
Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 15 s
Benchmarking 0_warmup...
Benchmarking 1_normal...
Benchmarking 2_optimal_overridden...
warning: redefining module Optaimal (current version loaded from _build/test/lib/optimal/ebin/Elixir.Optimal.beam)
  priv/myapp_web/script_optimal.exs:23
Name                           ips        average  deviation         median         99th %
2_optimal_overridden        270.69        3.69 ms    ±33.85%        3.26 ms        7.91 ms
1_normal                    248.06        4.03 ms    ±35.16%        3.58 ms        9.72 ms
0_warmup                    129.54        7.72 ms  ±1258.76%        3.45 ms       11.21 ms
Comparison:
2_optimal_overridden        270.69
1_normal                    248.06 - 1.09x slower +0.34 ms
0_warmup                    129.54 - 2.09x slower +4.03 ms
@zachdaniel
Copy link
Member

That is really useful info! I’d be happy to accept a PR that removed the optimal dependency.

@GregMefford
Copy link
Member

Yeah I was surprised that the performance impact was so significant! My only concern would be that removing it would be a breaking change unless we somehow keep the behavior the same. In practice, it's probably fine to remove it or at least make it disable-able, because for most use-cases, people will only be sending valid spans once they figure out the right things to send.

If we removed Optimal, things might silently fail or fail in a different place, but at the end of the day, it would've crashed anyway.

@zachdaniel
Copy link
Member

I think an alternative would be to fork optimal (I actually wrote it originally, but its in my old employer's name) and just fix the performance problem.

@zachdaniel
Copy link
Member

They'd probably be amenable to a PR, also.

@vherr2
Copy link

vherr2 commented Jun 28, 2019

If someone makes a PR I'll gladly take a look; Hoping to take a glance at it if y'all don't get to it first!

@zachdaniel
Copy link
Member

We should just switch from optimal to nimble options. Anyone who wants to take a crack at this is welcome!

@connorlay
Copy link

@zachdaniel I'm interested in picking this up! I've been using Spandex for over a year now and want to contribute back 😁

@zachdaniel
Copy link
Member

Awesome! No one else has claimed it, so go for it 💪

@kamilkowalski
Copy link
Contributor

Hey @connorlay, need any help? We're seeing considerable performance overhead when using Spandex at Fresha - I'd like to factor out optimal but since it's a considerable amount of work I didn't want to duplicate it in case you've already started.

@connorlay
Copy link

Hi @kamilkowalski! Feel free to take this over. I never made much progress on this after realizing the size of the refactor.

@zachdaniel
Copy link
Member

The best strategy here is likely just to use nimble options. We'll just need to rewrite the schemas to their syntax.

@GregMefford
Copy link
Member

I'm also available / willing / interested in helping to make this migration happen. I think long-term we will likely just migrate away from Spandex in favor of OpenTelemetry, but given Datadog's timeline on supporting OpenTelemetry and the amount of effort it'll take to switch away from Spandex, this is a worthwhile investment in the meantime.

@zachdaniel
Copy link
Member

Well the API for NimbleOptions and Optimal are almost exactly the same, so I honestly don't think it would be too difficult to do. I'm too busy at the moment though unfortunately.

@GregMefford
Copy link
Member

I looked at it briefly and I don't think it'll be that bad, it's just a lot of tedious verification that things are still doing what we want, and also a technically-breaking change that I doubt anyone cares about in practice: our API currently returns Optimal validation errors, so probably the better API design would be to either return some well-known set of Spandex validation errors, or just change the typespecs to be less-specific about what kind of error struct you might get, and just switch it to NimbleOptions instead of Optimal, with or without a major version bump to signal that there's a breaking change there, depending on what we find in testing/verification that things are working.

@GregMefford
Copy link
Member

Another possible path could be to simply make Optimal be optional: if you have it in your app's deps, then options are validated and if not, then you don't and you just crash or something if you send an incorrect type. This would be the "high performance" production mode of operation after you've already validated that you're sending the right thing via dev/test.

@zachdaniel
Copy link
Member

The only issue with that would be default values/other logic around values. It doesn't just validate, it can also transform.

@hkrutzer
Copy link

hkrutzer commented Nov 4, 2020

Won't it be better for performance to not do validation? For functions where low latency is required people would probably prefer performace over ease of development.

@zachdaniel
Copy link
Member

Yeah, I think it would be better for sure, at least to have it only validate in development mode or something like that. I'm just saying we'll have to manually write the code that isn't validation (that sets defaults for example).

@kamilkowalski kamilkowalski linked a pull request Nov 29, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants