Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic profiling support #349

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

antirotor
Copy link
Contributor

@antirotor antirotor commented Sep 2, 2019

Goal

It would be nice to have mode, that would generate profiling data for pyblish process and plugins. Producing lot of timing and other data, that can be later used for visualizing and debugging whole session.

Motivation

My idea is to capture whole context after each plugin is done as snapshots along with it's timing, memory consumtion, etc. to json. Then to have simple application processing this json file, drawing all data to canvas, so we'll be able to vizualize, what is going on with all data during publishing. This could be useful when debugging behaviour of pipelines where lot of plugins are doing many things.

This implements #348

@antirotor
Copy link
Contributor Author

This still have some caveats. Getting memory information on windows is somehow cumbersome without additional external dependencies (such as psutils. Problem with psutils is that it needs developer version of python on linux (gcc and python header), complicating installation somehow.

Also missing tool to process profiling info yet.

@mottosso
Copy link
Member

mottosso commented Sep 2, 2019

Thanks for this.

Re psutil; it's kind of the de-facto standard for doing these kinds of OS queries; how about we depend on it as an optional dependency? That is, if it's available, the profiling will be that more thorough, and without it you get the basics. That way, users on Windows can pip install psutil and those on Linux could e.g. apt-get install python-psutil or whatever means are available, if they are interested in that kind of information.

If there is a pure-Python equivalent, that would of course be best. But in my experience it just isn't possible. :(

@antirotor
Copy link
Contributor Author

I've changed it to use optional psutil module. Pure python isn't there - only resource module but that is available only on linux and darwin systems.

Copy link
Member

@mottosso mottosso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in looking at this, I've made some notes in-line, and overall I think there's one thing missing, which is that profiling currently happens per-process, but are stored as one global object, being overwritten each time.

Instead, I think we should store them together with the context.data["results"] data. And actually, looking through the code, I found that we're already collecting the duration of each process in the results dict.

result["duration"] = (__end - __start) * 1000 # ms

This would be a good spot to continue adding profile data I think, so let's move the profiling context manager to there.

Let me know what you think.

profiling_path = os.path.join(tempfile.gettempdir(),
"{}-pyblish_profile.json".format(context.id))

append_to_json(profiling_data, profiling_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, I don't think this is a good idea.

Why not let the user decide what to do with the profiling information?

context = util.publish()

with open("somefile.json", "w") as f:
  json.dump(f, context.data["profiling"])

GUIs could get a hold of it via the context as well, inbetween each process if need be, but likely also as a post-process of sorts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you think that it shouldn't save to json file implicitly and it's up to user to handle data in context.data.get('statistics') (lets call it that) afterward? Just checking if I understand it correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, don't write files without the user's knowledge. In the vast majority of cases, the user won't care for those files anyway. Profiling is primarily useful for the plug-in author.


# collected using psutil module, None if it is not installed
try:
profiling_data['initial_memory_sage'] = profiler.initial_mem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo on _sage. The naming convention should be mixedCase for dictionary keys as well (JSON convention).

result = __explicit_process(plugin, context, instance, action)
else:
result = __implicit_process(plugin, context, instance, action)
if not context.get('profiling'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here, should be context.data.get(), as below.

Unsure about whether profiling is the best name here. Grammatically it's a verb, describing some sort of state. That it is currently "profiling". As it isn't meant as that, it should probably we profile, but even that could be confused with an action. To "profile" something.

I'm thinking context.data["statistics"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, statistics is much better.

@@ -58,6 +60,45 @@
STRICT_DATATYPES = EARLY_ADOPTER or STRICT_DATATYPES


class ProfilingTimer():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick with a @contextmanager, like it's being used elsewhere in Pyblish.

import contextlib

@contextlib.contextmanager
def profiling():
  stats = {}
  stats["startTime"] = time.time()
  yield stats
  stats["endTime"] = time.time()

with profiling() as stats:
  ...

Alternatively..

stats = {}
with profiling(stats):
  ...

context.data['profiling'][plugin] = profiling_data
profiling_data["snapshot"] = {
"timestamp": int(time.time()),
"context": json.dumps(context, default=lambda o: '<not serializable>')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about "snapshot".. If the goal is capturing more data, I think we should instead favour a mechanism with which to let the plug-in author provide his own data, as mentioned in #348 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about snapshot as "map" if you will, that with all those profiling data you could later on visualize whole pyblish process with all its timings and what was changed in context each step. But yes, adding custom data from every plugin should be implemented also.


context.data['profiling'][plugin] = profiling_data
profiling_data["snapshot"] = {
"timestamp": int(time.time()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep startTime from stats we can remove this.

@mottosso
Copy link
Member

Hi @antirotor, revisiting this long-forgotten issue and would like to merge it. It's a good feature! There are just a few notes left before a first 1.0 of the feature, if you're up for it it would be good to see those though.

@antirotor
Copy link
Contributor Author

Hi @antirotor, revisiting this long-forgotten issue and would like to merge it. It's a good feature! There are just a few notes left before a first 1.0 of the feature, if you're up for it it would be good to see those though.

Sure thing, sorry for this zombie-state. I had more urgent things to do but it is still on my mind. I'll try to take care of it asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants