-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Final Hw3 Submission #434
Open
aidizhang
wants to merge
1
commit into
harvard-cs205:master
Choose a base branch
from
aidizhang:HW3
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Final Hw3 Submission #434
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
After running the timing code, the best configuration is as follows: | ||
|
||
configuration ('coalesced',256,128): 0.00301426 seconds | ||
|
||
I am running the code on my 2012 MacBook Pro 13" with the following specs: | ||
|
||
--------------------------- | ||
Apple Apple version: OpenCL 1.2 (May 10 2015 19:38:45) | ||
The devices detected on platform Apple are: | ||
--------------------------- | ||
Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz [Type: CPU ] | ||
Maximum clock Frequency: 2500 MHz | ||
Maximum allocable memory size: 2147 MB | ||
Maximum work group size 1024 | ||
--------------------------- | ||
HD Graphics 4000 [Type: GPU ] | ||
Maximum clock Frequency: 1100 MHz | ||
Maximum allocable memory size: 268 MB | ||
Maximum work group size 512 | ||
--------------------------- | ||
This context is associated with 2 devices | ||
The queue is using the device: HD Graphics 4000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
Results were as follows: | ||
|
||
Maze 1: | ||
Part1: Finished after 904 iterations, 418.91049 ms total, 0.46339656892 ms per iteration | ||
Found 2 regions | ||
Part2: Finished after 531 iterations, 241.37665 ms total, 0.45456996761 ms per iteration | ||
Found 2 regions | ||
Part 3: Finished after 10 iterations, 4.61896 ms total, 0.4618960467 ms per iteration | ||
Found 2 regions | ||
Part 4: Finished after 10 iterations, 13.07019 ms total, 1.30701931928 ms per iteration | ||
Found 2 regions | ||
|
||
Maze 2: | ||
Part1: Finished after 523 iterations, 238.76491 ms total, 0.4565294731 ms per iteration | ||
Found 35 regions | ||
Part 2: Finished after 283 iterations, 128.76932 ms total, 0.45501526965 ms per iteration | ||
Found 35 regions | ||
Part 3: Finished after 9 iterations, 4.05541 ms total, 0.45060196899 ms per iteration | ||
Found 35 regions | ||
Part 4: Finished after 9 iterations, 11.66303 ms total, 1.2958923333 ms per iteration | ||
Found 35 regions | ||
|
||
My results for Part 4 were much slower than results for part 3 (to the magnitude of 3 to 4 times slower) which suggests that using a single thread is not a good choice. Note that there are the same number of iterations and only time per iteration increases. This is probably due to the fact that there exists a tradeoff between memory and compute - given the specific architecture of my machine, our computation in this case is compute-bound instead of memory-bound. It appears that the benefits of single-threading (which include avoiding redundantly checking global memory) is outweighed by the costs of having to perform computations serially and losing parallelism. Perhaps if the pixel labels were more similar or if my GPU were memory-bound, then using a single thread might be a good choice, but not in this case. | ||
|
||
Part 5: | ||
|
||
On the correctness front, the atomic_min() operation ensures that the calculations and writing out of results is done in one atomic step, i.e. no other thread is able to intervene when our native thread is still under operation. Switching to using min() would mean that race conditions might occur in the sense that our old label is updated redundantly. The final result would still be correct. | ||
|
||
However, in terms of time, we run the risk of having extra redundant updates, which is could hurt our runtime and might cause our algorithm to run more slowly. Even though the min() operation is itself faster than the atomic_min() operation, the redundancies in operations might cause overall runtime to increase. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,7 +56,7 @@ propagate_labels(__global __read_write int *labels, | |
|
||
// 1D index of thread within our work-group | ||
const int idx_1D = ly * get_local_size(0) + lx; | ||
|
||
int old_label; | ||
// Will store the output value | ||
int new_label; | ||
|
@@ -80,20 +80,57 @@ propagate_labels(__global __read_write int *labels, | |
old_label = buffer[buf_y * buf_w + buf_x]; | ||
|
||
// CODE FOR PARTS 2 and 4 HERE (part 4 will replace part 2) | ||
|
||
// Overwritten code for PART 2 | ||
// if (old_label < w * h) { | ||
// // Grab grandparent | ||
// buffer[buf_y * buf_w + buf_x] = labels[old_label]; | ||
// } | ||
|
||
// PART 4 | ||
// Update workgroup labels using single thread | ||
if (lx == 0 && ly == 0) { | ||
// Keeps track of previous key and result | ||
int prev_label = -1; | ||
int prev_result; | ||
|
||
// Iterate through entire buffer | ||
for (int i = 0; i < buf_w * buf_h; i++) { | ||
int temp_label = buffer[i]; | ||
|
||
if (temp_label < w * h) { | ||
// Reset if previous is not the same as current label | ||
if (prev_label != temp_label) { | ||
prev_label = temp_label; | ||
prev_result = labels[prev_label]; | ||
} | ||
buffer[i] = prev_result; | ||
} | ||
} | ||
} | ||
|
||
// stay in bounds | ||
if ((x < w) && (y < h)) { | ||
if ((x < w) && (y < h) && (old_label < w * h)) { | ||
// CODE FOR PART 1 HERE | ||
// We set new_label to the value of old_label, but you will need | ||
// to adjust this for correctness. | ||
|
||
// Update new label | ||
new_label = old_label; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After parts 2 and 4, you should use buffer[buf_w * buf_y + buf_x] instead of old_label. |
||
|
||
if (new_label < w * h) { | ||
// Take minimum of minimums over rows and columns | ||
int row_min = min(buffer[buf_y * buf_w + buf_x - 1], buffer[buf_y * buf_w + buf_x + 1]); | ||
int col_min = min(buffer[(buf_y - 1) * buf_w + buf_x], buffer[(buf_y + 1) * buf_w + buf_x]); | ||
new_label = min(row_min, col_min); | ||
} | ||
|
||
if (new_label != old_label) { | ||
// CODE FOR PART 3 HERE | ||
// indicate there was a change this iteration. | ||
// multiple threads might write this. | ||
*(changed_flag) += 1; | ||
labels[y * w + x] = new_label; | ||
|
||
atomic_min(&labels[old_label], new_label); | ||
atomic_min(&labels[y * w + x], new_label); | ||
} | ||
} | ||
} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is FETCH defined?
This code does not compile.