Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representation of MDPs #1

Open
czgdp1807 opened this issue Jun 14, 2020 · 1 comment
Open

Representation of MDPs #1

czgdp1807 opened this issue Jun 14, 2020 · 1 comment

Comments

@czgdp1807
Copy link
Member

Description of the problem

Since, RL aims to solve MDPs i.e., Markov Decision Processes so our first aim should be decide on their representation. It should be designed in such a way that RL algorithms can easily use these representations for finding optimal/sub-optimal solutions.
MDPs have the following elements,

  1. State
  2. Actions
  3. Transition Probabilities
  4. Transition Rewards
  5. Policy
  6. Performance Metric

SMDPs have an additional element called Time of Transition.

How each of the above elements can be represented? One idea can be to use a class for encapsulating the above elements.

Example of the problem

References/Other comments

@czgdp1807
Copy link
Member Author

MDPs and its associated concepts may be represented using the following class structure,

class Action
{
    private string description;

   private Action(const string& description="");
   public static getObject(const string& description="");
   private ~Action();
   public void deleteObject();
   public string getDescription();
};

template <class _type>
class State
{
    private string description;
    private vector<Action*> actions;
    unordered_map<Action*, _type> transitionProbs;
    unordered_map<Action*, _type> iTransitionRewards;
    
   private State(const string& description="");
   public static getObject(const string& description="");
   public void addAction(Action& action);
   public void setTransitionProb(Action& action, _type transitionProb);
   public void setITransitionRewards(Action& action, _type reward);
};

template <class _type>
class MarkovDecisionProcess
{
    private vector<State<_type>*> stateSpace;
    private unordered_map<State<_type>*, Action*>  policy;
    friend _type performanceMetric();
    
    private MarkovDecisionProcess();
    public static getObject();
    public void addState(State& state);
    public void updatePolicy(State& state, Action& action);
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant