Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifies converse schedeuler to prioritize NodeGroup messages #3676

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
46 changes: 29 additions & 17 deletions src/conv-core/convcore.C
Original file line number Diff line number Diff line change
Expand Up @@ -1657,6 +1657,8 @@ void CsdSchedulerState_new(CsdSchedulerState_t *s)
#if CMK_NODE_QUEUE_AVAILABLE
s->nodeQ=CsvAccess(CsdNodeQueue);
s->nodeLock=CsvAccess(CsdNodeQueueLock);
ZwFink marked this conversation as resolved.
Show resolved Hide resolved
s->nodeGrpFreq=4;
s->iter = 0;
#endif
#if CMK_GRID_QUEUE_AVAILABLE
s->gridQ=CpvAccess(CsdGridQueue);
Expand Down Expand Up @@ -1720,29 +1722,39 @@ void CsdSchedulerState_new(CsdSchedulerState_t *s)
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The long expository comment preceeding the CsdNextMessage definition should be updated to match the new behavior, and any other behavior that it doesn't describe, such as csdLocalMax querying the PE onnode FIFO (localQ) and scheduler queues first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, break up the exposition so that the explanations are adjacent to the code they are intended to describe.

void *CsdNextMessage(CsdSchedulerState_t *s) {
void *msg;
if((*(s->localCounter))-- >0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code should be below the localCounter branch, right above the CmiGetNonLocal() call. The default CsdLocalMax is 0 so normally it won't matter, but if the user says to prioritize local messages then they should come first.

{
/* This avoids a race condition with migration detected by megatest*/
msg=CdsFifo_Dequeue(s->localQ);
if (msg!=NULL)
{

if ((*(s->localCounter))-- > 0) {
/* This avoids a race condition with migration detected by megatest*/
msg = CdsFifo_Dequeue(s->localQ);
if (msg != NULL) {
#if CMI_QD
CpvAccess(cQdState)->mProcessed++;
CpvAccess(cQdState)->mProcessed++;
#endif
return msg;
}
CqsDequeue(s->schedQ,(void **)&msg);
if (msg!=NULL) return msg;
}

return msg;
}
CqsDequeue(s->schedQ, (void**)&msg);
if (msg != NULL)
return msg;
}

#if CMK_NODE_QUEUE_AVAILABLE
s->iter++;
// we use nodeGrpFreq == 0 to mean
// don't check NodeQ with high priority
if (s->nodeGrpFreq && (0 == (s->iter % s->nodeGrpFreq))) {
msg = CmiGetNonLocalNodeQ();
if (NULL != msg) return msg;
}
#endif

*(s->localCounter)=CsdLocalMax;
if ( NULL!=(msg=CmiGetNonLocal()) ||
if ( NULL!=(msg=CmiGetNonLocal()) ||
NULL!=(msg=CdsFifo_Dequeue(s->localQ)) ) {
#if CMI_QD
CpvAccess(cQdState)->mProcessed++;
CpvAccess(cQdState)->mProcessed++;
#endif
return msg;
}
return msg;
}
#if CMK_GRID_QUEUE_AVAILABLE
/*#warning "CsdNextMessage: CMK_GRID_QUEUE_AVAILABLE" */
CqsDequeue (s->gridQ, (void **) &msg);
Expand Down
2 changes: 2 additions & 0 deletions src/conv-core/converse.h
Original file line number Diff line number Diff line change
Expand Up @@ -1129,6 +1129,8 @@ typedef struct {
void *localQ;
Queue nodeQ;
Queue schedQ;
unsigned short iter;
unsigned short nodeGrpFreq;
int *localCounter;
#if CMK_OBJECT_QUEUE_AVAILABLE
Queue objQ;
Expand Down
Loading