Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Discussions #3

Open
czgdp1807 opened this issue Feb 1, 2020 · 4 comments
Open

Design Discussions #3

czgdp1807 opened this issue Feb 1, 2020 · 4 comments
Labels

Comments

@czgdp1807
Copy link
Member

Description of the problem

This issue aims at discussing the design of the software that is going to be developed covering the below topics,

  1. File structure
  2. User facing APIs
  3. Class design
  4. Hardware requirements(mainly GPUs)

I will try to come up with the first one ASAP. However, if you already prepared something then let us know in the comments.

Example of the problem

References/Other comments

We will follow https://web.stanford.edu/~hastie/Papers/samme.pdf
If you have something to suggest which can be used in the project then let us know.

@czgdp1807
Copy link
Member Author

I have thought of the following file structure for the project,

├── AUTHORS
├── CMake
├── CMakeLists.txt
├── CODE_OF_CONDUCT.md
├── ISSUE_TEMPLATE.md
├── LICENSE
├── PULL_REQUEST_TEMPLATE.md
├── README.md
└── src
    ├── adaboost
    ├── bindings
    ├── CMakeLists.txt
    ├── tests
    └── utils

6 directories, 8 files

References - https://github.com/mlpack/mlpack

The above is just a rough idea of moving forward. If you have any suggestions the let me know. I will make it more concrete in the coming days.

@czgdp1807
Copy link
Member Author

The final file directory structure will be similar too,

.
├── AUTHORS
├── CMakeLists.txt
├── CODE_OF_CONDUCT.md
├── ISSUE_TEMPLATE.md
├── LICENSE
├── PULL_REQUEST_TEMPLATE.md
├── README.md
└── src
    ├── adaboost
    │   └── CMakeLists.txt
    ├── bindings
    │   ├── CMakeLists.txt
    │   └── python
    │       └── CMakeLists.txt
    ├── CMakeLists.txt
    ├── core
    │   └── CMakeLists.txt
    ├── cuda
    │   └── CMakeLists.txt
    ├── tests
    │   └── CMakeLists.txt
    └── utils
        └── CMakeLists.txt

8 directories, 15 files

I have removed build related files(CMake and other stuff). We will add them later on. Let's move on to discuss User facing APIs.

@czgdp1807
Copy link
Member Author

Each module will be having two types of files,

  1. ".hpp" - These files will contain the declarations of various classes, functions, etc. associated with that module.
  2. "_impl.hpp" - These files will contain the implementations of various classes, functions declared in the header files.

In addition there will be a test folder in each module containing the associated tests for that module.

The purpose of each module is describe in the points mentioned below,

1.core - This will contain the data structures(Matrices, Vectors) and the operations on them. We will include their We will include the operations on these data structures as member functions of classes.
2. cuda - It will contain the CUDA C versions of all the functions in other modules.
3. adaboost - This will contain the various functions, classes for the adaboost algorithms.
4. bindings/python - This will contain the Boost.Python code for generating python bindings.

Let's start the discussion for the API of core module.
We will need the following data structures for the AdaBoost Algorithm:

  1. Vector - For storing the inputs/predictions.
  2. Matrix2D - For representing the complete data set.

The following operations will be required,

  1. Sum - For summation of elements of a Vector
  2. Argmax - Finding the input for which a given function maximizes its value.
  3. I - Returns a boolean if the given condition is true.

For deciding the API we can consider the use of the above functions in the algorithm.
Refer page 2 of https://web.stanford.edu/~hastie/Papers/samme.pdf

  1. In various steps of the algorithm, for summation a function is applied per element of a Vector and the results per summed.
    One approach can be is to pass function pointer that is to be applied per element and doing the computation while summing. Something line,
Sum(function_pointer address, Vector, start, end)

The above will work fine for CPU. For GPU counter part, we will be needing a __device__ may be named as, SumGPU with a similar API. Refer https://stackoverflow.com/a/15646771

  1. Argmax needs a function and a set of values specifying the domain. So, the API can be,
Argmax(function_pointer, Vector)
  1. I is very trivial. We just need a function specifying the condition.
I(function_pointer)

All the functions above will have their counter parts for GPU.

Refer, https://stackoverflow.com/a/12374170 as well.

@czgdp1807
Copy link
Member Author

czgdp1807 commented Feb 15, 2020

I is not needed in our case as it's too trivial. AdaBoost just uses a fixed operation != in the algorithm for I.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant