Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved LatticeAccess #440

Merged
merged 65 commits into from
Aug 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
dc08db7
Not working
llaniewski Jun 23, 2023
65b3bd8
Working 1
llaniewski Jun 24, 2023
e80dabe
Making the config.R not trigger build if not changed
llaniewski Jun 26, 2023
fb2f2bb
Merge remote-tracking branch 'origin/feature/fastdem' into feature/box8
llaniewski Jun 26, 2023
40f1e1d
Const NodeType
llaniewski Jun 29, 2023
1e16a32
Making border kernel more parallel
llaniewski Jun 29, 2023
8cfb9cd
Adding zone settings preload
llaniewski Jun 29, 2023
948be73
Adding WARPSIZE as compile time constant
llaniewski Jul 6, 2023
caeb438
Fixing branch unitilalized value
llaniewski Jul 6, 2023
c33de14
Cleanup of cuda
llaniewski Jul 6, 2023
43995d3
Fixing CudaAtomicMaxReduceWarp on CPU
llaniewski Jul 6, 2023
310b289
Fixing CPU border kernel
llaniewski Jul 6, 2023
c998458
Unfinished changes before merge
llaniewski Jul 6, 2023
c9d421b
Merge remote-tracking branch 'origin/feature/accessor' into feature/s…
llaniewski Jul 6, 2023
abf19ed
Working version with one kernel
llaniewski Jul 6, 2023
6073c16
Adding printing and bumping up the max threads
llaniewski Jul 6, 2023
9c89f33
Fixing paranoid warnings
llaniewski Jul 6, 2023
3816fe6
Merge remote-tracking branch 'origin/feature/fastdem' into feature/si…
llaniewski Jul 11, 2023
6ebd1f2
Correcting boundary conditionals
llaniewski Jul 11, 2023
e64b5b0
update of particle tests
llaniewski Jul 12, 2023
acc2586
Merge remote-tracking branch 'origin/feature/fastdem' into feature/si…
llaniewski Jul 12, 2023
764e8e8
Merge remote-tracking branch 'origin/feature/fastdem' into feature/si…
llaniewski Jul 12, 2023
f257175
Adding margin
llaniewski Jul 13, 2023
0cc719a
fixing an error in border kernel
llaniewski Jul 13, 2023
74a8ed6
Unfinished buisiness
llaniewski Jul 13, 2023
54036ee
Merge remote-tracking branch 'origin/feature/fastdem' into feature/si…
llaniewski Jul 14, 2023
53a9a52
Merge remote-tracking branch 'origin/feature/singlekernel' into featu…
llaniewski Jul 14, 2023
582863a
Fixing a bug in CudaSyncWarpOr definition
llaniewski Jul 14, 2023
285cffd
Compiling version with global calculators
llaniewski Jul 14, 2023
4228363
Half-working version with global calculator
llaniewski Jul 14, 2023
f659946
Fixing bug in ordering of Globals
llaniewski Jul 14, 2023
0845b36
Fixing bug for no Globals
llaniewski Jul 14, 2023
97c0b1a
Auto GetThreads initalisation based on static mechanism
llaniewski Jul 17, 2023
8e2ae44
Merge remote-tracking branch 'origin/feature/singlekernel' into featu…
llaniewski Jul 17, 2023
c14650f
Added range_int and load through LatticeAccess
llaniewski Jul 18, 2023
2576409
temporary move of nt initalisation in LatticeAccess
llaniewski Jul 18, 2023
79aed16
Adding wrap.const to wrap constants in range_int
llaniewski Jul 19, 2023
9a60c0f
Reducing the max threads
llaniewski Jul 19, 2023
786f276
Merge remote-tracking branch 'origin/feature/singlekernel' into featu…
llaniewski Jul 19, 2023
794bc24
Adding missing range_int.hpp
llaniewski Jul 19, 2023
d5d52fa
Working pop/push in LatticeAccess
llaniewski Jul 19, 2023
3f955db
LatticeAccess as argument to Node constructor
llaniewski Jul 19, 2023
63aa206
Moving template arguments to executors
llaniewski Jul 19, 2023
081a12a
Making LatticeAccess a template argument
llaniewski Jul 19, 2023
b25ab18
Interior LatticeAccess (doesn't seem to improve the speed)
llaniewski Jul 19, 2023
7365c12
correcting x ranges
llaniewski Jul 20, 2023
a5cd592
Adding autosym support
llaniewski Jul 20, 2023
70b950f
Adding unitary minus to range_int
llaniewski Jul 21, 2023
216d6b6
Fixing minor bugs
llaniewski Jul 21, 2023
5cb9bac
Adding unitary minus to range_int (correction)
llaniewski Jul 21, 2023
4211495
ensuring range_int in dynamic access
llaniewski Jul 21, 2023
fb34f41
Reducing the number of ifs in loads
llaniewski Jul 21, 2023
990126d
Removing dx,dy,dz,fx,fy,fz from LatticeContainer
llaniewski Jul 21, 2023
663ddb8
Correcting globals
llaniewski Jul 28, 2023
967b9ae
Merge remote-tracking branch 'origin/feature/fastdem' into feature/si…
llaniewski Jul 31, 2023
fce6951
fixing bounds in BorderExecutor
llaniewski Aug 1, 2023
f4c4cf1
Adding more verbose output in csvdiff
llaniewski Aug 1, 2023
35b606d
Correcting PART_MAR for float
llaniewski Aug 1, 2023
0142dee
Merge remote-tracking branch 'origin/feature/singlekernel' into featu…
llaniewski Aug 1, 2023
ee17d85
Merge remote-tracking branch 'origin/feature/fastdem' into feature/si…
llaniewski Aug 1, 2023
f40c93d
Merge remote-tracking branch 'origin/feature/singlekernel' into featu…
llaniewski Aug 1, 2023
54c652b
Merge remote-tracking branch 'origin/feature/fastdem' into feature/si…
llaniewski Aug 2, 2023
d4e6d7c
Merge remote-tracking branch 'origin/feature/singlekernel' into featu…
llaniewski Aug 2, 2023
60897e4
Adding CompositeAccess
llaniewski Aug 3, 2023
7f1ad01
cleanup
llaniewski Aug 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions models/PDE/wave2D/conf.mk
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
OPT="autosym"
4 changes: 1 addition & 3 deletions src/Consts.h.Rt
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,8 @@

#ifdef CROSS_CPU
#define MAX_THREADS 1
#define X_BLOCK 1
#else
#define MAX_THREADS 512
#define X_BLOCK 32
#define MAX_THREADS 128
#endif

#define PART_MAR <?%f PartMargin ?>
Expand Down
2 changes: 1 addition & 1 deletion src/Dynamics.h.Rt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
for (n in c_table_decl(unique(c(Density$name,Fields$name)))) { ?>
real_t <?%s n ?>; <?R
} ?>
flag_t NodeType; ///< Node flag/type

1 change: 1 addition & 0 deletions src/Geometry.cpp.Rt
Original file line number Diff line number Diff line change
Expand Up @@ -1132,6 +1132,7 @@ int Geometry::Draw(pugi::xml_node & node)

#else
error("You need to compile PYTHON support for this geometry element");
return -1;
#endif

} else if (strcmp(n.name(), "STL") == 0) {
Expand Down
45 changes: 45 additions & 0 deletions src/GetThreads.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#include "GetThreads.h"
#ifdef HAS_CXXABI_H
#include <cxxabi.h>
#endif
#include <algorithm>

void ThreadNumberCalculatorBase::InitAll() {
list_t list = List();
for (type* ptr : list) ptr->Init();
std::sort(list.begin(), list.end(), compare);
for (type* ptr : list) ptr->print();
}

ThreadNumberCalculatorBase::ThreadNumberCalculatorBase() {
List().push_back(this);
}

void ThreadNumberCalculatorBase::print() {
if (thr.x * thr.y < maxthr) {
notice( " %3dx%-3d | %s --- Reduced from maximum %d\n", thr.x, thr.y, name.c_str(), maxthr);
} else {
output( " %3dx%-3d | %s\n", thr.x, thr.y, name.c_str());
}
}

std::string cxx_demangle(std::string str) {
#ifdef HAS_CXXABI_H
int status;
char *c_ret = abi::__cxa_demangle(str.c_str(), 0, 0, &status);
if (c_ret != NULL) {
std::string ret(c_ret);
free(c_ret);
if (status == 0) return ret;
}
#endif
return str;
}

int InitDim() {
MPI_Barrier(MPMD.local);
output( " Threads | Action\n");
ThreadNumberCalculatorBase::InitAll();
MPI_Barrier(MPMD.local);
return 0;
}
77 changes: 77 additions & 0 deletions src/GetThreads.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#include "Global.h"
#include <typeinfo>
#include "cross.h"

template <class E> CudaGlobalFunction void Kernel();

std::string cxx_demangle(std::string str);

inline int ceiling_div(const int & x, const int & y) {
return x / y + (x % y != 0);
}

/// Get maximal number of threads for all the kernels on runtime
template < class T > int GetThreads() {
dim3 ret;
CudaFuncAttributes * attr = new CudaFuncAttributes;
CudaFuncGetAttributes(attr, Kernel<T>) ;
debug1( "[%d] Constant mem:%ld\n", D_MPI_RANK, attr->constSizeBytes);
debug1( "[%d] Local mem:%ld\n", D_MPI_RANK, attr->localSizeBytes);
debug1( "[%d] Max threads:%d\n", D_MPI_RANK, attr->maxThreadsPerBlock);
debug1( "[%d] Reg Number:%d\n", D_MPI_RANK, attr->numRegs);
debug1( "[%d] Shared mem:%ld\n", D_MPI_RANK, attr->sharedSizeBytes);
return attr->maxThreadsPerBlock;
}

class ThreadNumberCalculatorBase {
typedef ThreadNumberCalculatorBase type;
typedef std::vector< type* > list_t;
static inline list_t& List() {
static list_t list;
return list;
}
static inline bool compare ( const type* a, const type* b ) { return a->name < b->name; }
protected:
dim3 thr;
unsigned int maxthr;
std::string name;
public:
static void InitAll();
ThreadNumberCalculatorBase();
virtual void Init() = 0;
inline dim3 threads() { return thr; }
void print();
};

template < class T > class ThreadNumberCalculator : public ThreadNumberCalculatorBase {
public:
virtual void Init() {
name = cxx_demangle(typeid(T).name());
maxthr = GetThreads< T >();
thr.z = 1;
int val = maxthr;
if (maxthr < X_BLOCK) {
thr.x = maxthr;
thr.y = 1;
} else {
if (val > MAX_THREADS) {
val = MAX_THREADS;
}
thr.x = X_BLOCK;
thr.y = val/X_BLOCK;
}
};
};

template < class T > class ThreadNumber {
typedef ThreadNumberCalculator< T > calc_t;
static calc_t calc;
public:
static inline dim3 threads() { return calc.threads(); }
};

template < class T > ThreadNumberCalculator<T> ThreadNumber<T>::calc;

/// Initialize Thread/Block number variables
int InitDim();

Loading