Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifies converse schedeuler to prioritize NodeGroup messages #3676

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
20 changes: 17 additions & 3 deletions src/conv-core/convcore.C
Original file line number Diff line number Diff line change
Expand Up @@ -1655,8 +1655,10 @@ void CsdSchedulerState_new(CsdSchedulerState_t *s)
s->schedQ=CpvAccess(CsdSchedQueue);
s->localCounter=&(CpvAccess(CsdLocalCounter));
#if CMK_NODE_QUEUE_AVAILABLE
s->nodeQ=CsvAccess(CsdNodeQueue);
s->nodeLock=CsvAccess(CsdNodeQueueLock);
s->nodeQ=CsvAccess(CsdNodeQueue);
ZwFink marked this conversation as resolved.
Show resolved Hide resolved
s->nodeLock=CsvAccess(CsdNodeQueueLock);
s->nodeGrpFreq=4;
s->iter = 0;
#endif
#if CMK_GRID_QUEUE_AVAILABLE
s->gridQ=CpvAccess(CsdGridQueue);
Expand Down Expand Up @@ -1720,7 +1722,19 @@ void CsdSchedulerState_new(CsdSchedulerState_t *s)
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The long expository comment preceeding the CsdNextMessage definition should be updated to match the new behavior, and any other behavior that it doesn't describe, such as csdLocalMax querying the PE onnode FIFO (localQ) and scheduler queues first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, break up the exposition so that the explanations are adjacent to the code they are intended to describe.

void *CsdNextMessage(CsdSchedulerState_t *s) {
void *msg;
if((*(s->localCounter))-- >0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code should be below the localCounter branch, right above the CmiGetNonLocal() call. The default CsdLocalMax is 0 so normally it won't matter, but if the user says to prioritize local messages then they should come first.


s->iter++;

#if CMK_NODE_QUEUE_AVAILABLE
// we use nodeGrpFreq == 0 to mean
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation

// don't check NodeQ with high priority
if (s->nodeGrpFreq && (0 == (s->iter % s->nodeGrpFreq)))
{
msg = CmiGetNonLocalNodeQ();
if (NULL != msg) return msg;
}
#endif
if((*(s->localCounter))-- >0)
{
/* This avoids a race condition with migration detected by megatest*/
msg=CdsFifo_Dequeue(s->localQ);
Expand Down
6 changes: 6 additions & 0 deletions src/conv-core/converse.h
Original file line number Diff line number Diff line change
Expand Up @@ -1129,6 +1129,12 @@ typedef struct {
void *localQ;
Queue nodeQ;
Queue schedQ;
unsigned short iter; // counting number of sched iterations
unsigned short nodeGrpFreq; // call nodegroup queue once every nodeGrpFreq iterations with high prio
// should add a function to change this from the program for advanced users. One obstacle:
ZwFink marked this conversation as resolved.
Show resolved Hide resolved
// it is inside a struct that is on stack, and so not accessible for standalone functions. Need to
// resolve this by making a schedule a c++ object, but even then we need a ptr to the currently-running scheduler
// 0 means do not check nodegroup queue with high prio.. will be checked with low prio after other Qs
int *localCounter;
#if CMK_OBJECT_QUEUE_AVAILABLE
Queue objQ;
Expand Down
Loading