Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up sequential CPU code through OpenMP #51

Open
slizzered opened this issue Feb 10, 2015 · 3 comments
Open

Speed up sequential CPU code through OpenMP #51

slizzered opened this issue Feb 10, 2015 · 3 comments

Comments

@slizzered
Copy link
Contributor

There might be some loops that can pose a bottleneck. They might be rather easily parallelized with OpenMP (cmake-file will need to be tweaked)

@slizzered slizzered added this to the 1.5 - extended features milestone Feb 10, 2015
slizzered added a commit to slizzered/haseongpu that referenced this issue Jun 8, 2015
 see ComputationalRadiationPhysics#51

 Some observations:
  - we have only trivial (and fast) loops
  - the other loops are integral steps of the simulation and can not be
    parallelized (sequential steps and maybe with device-code)
  - one of the parsing loops uses cudaSetDevice, not sure if it's
    possible to parallelize that in a good way.
  - parallelizing std::vector is ok as long as the length is fixed (no
    reallocation). That means, we may not use vector.push_back() or
    vector.insert() inside a loop with OpenMP pragmas. Compiler might
    not complain...

 Only "easy" loops were parallelized, only basic pragmas were used.
 Might give some speedup one day, but if not... no problem. Pragmas and
 code changes are non-intrusive enough to keep it maintainable.
@slizzered
Copy link
Contributor Author

I started working on this.
branch: https://github.com/slizzered/haseongpu/tree/issue51-openMP-hostcode

Not sure how necessary this actually is... might introduce code complexity without benefits. See also the commit message: ( slizzered@91625ac )

@erikzenker
Copy link
Member

To gain maximal performance we need to reduce the runtime of our sequential code base. We have two possibilities to gain this reduction:

  • Make sequential code faster
  • Make sequential code parallel

So, I think some investigation makes sense. But, you are right, it looks a bit weird if computation unrelated code is parallized with OpenMP 🌺

@slizzered
Copy link
Contributor Author

Well, it looks not too weird to me. The problem is rather, that it does not bring any speedup, since the loops are pretty small/fast.

The MATLAB functions seem to be one of the more important problems (really slow...)

@slizzered slizzered modified the milestones: 1.5 - extended features, 2.0 - the next generation, 1.7. - extended features Sep 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants