Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoalEnv for Robot-Supervisor scheme #103

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 22 additions & 1 deletion deepbots/supervisor/controllers/robot_supervisor.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from warnings import warn, simplefilter
from deepbots.supervisor.controllers.supervisor_env import SupervisorEnv
from deepbots.supervisor.controllers.supervisor_env import SupervisorEnv, SupervisorGoalEnv
from controller import Supervisor


Expand Down Expand Up @@ -99,3 +99,24 @@ def apply_action(self, action):
:param action: list, containing action data
"""
raise NotImplementedError

KelvinYang0320 marked this conversation as resolved.
Show resolved Hide resolved
class RobotGoalSupervisor(SupervisorGoalEnv, RobotSupervisor):
"""
The RobotGoalSupervisor class is just like RobotSupervisor, but it
uses compute_reward from gym.GoalEnv.
"""
def __init__(self, timestep=None):
super(RobotGoalSupervisor, self).__init__()

if timestep is None:
self.timestep = int(self.getBasicTimeStep())
else:
self.timestep = timestep

def step(self, action):
"""
The basic step method is use-case specific and needs to be implemented
by the user and please use compute_reward inherited from gym.GoalEnv()
instead of get_reward().
"""
raise NotImplementedError
KelvinYang0320 marked this conversation as resolved.
Show resolved Hide resolved
KelvinYang0320 marked this conversation as resolved.
Show resolved Hide resolved
120 changes: 78 additions & 42 deletions deepbots/supervisor/controllers/supervisor_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,8 @@
from controller import Supervisor


class SupervisorEnv(Supervisor, gym.Env):
"""
This class is the highest class in deepbots class hierarchy, inheriting
both the Webots Supervisor controller and the basic gym.Env.

Refer to gym.Env documentation on how to implement a custom gym.Env
for additional functionality.

This class contains abstract methods that guide the development process
for users that want to implement a simple environment.

This class is not intended for user usage, but to provide a common
interface for all provided supervisor classes and make them
compatible with reinforcement learning agents that work with
the gym interface. Moreover, a problem-agnostic reset method is
provided. Please use any of the children supervisor classes to be
inherited by your own class, such as the RobotSupervisor class.
Nevertheless, advanced users can inherit this class to create
their own supervisor classes if they wish.
"""

class SupervisorBasicEnv:
KelvinYang0320 marked this conversation as resolved.
Show resolved Hide resolved

def step(self, action):
"""
On each timestep, the agent chooses an action for the previous
Expand All @@ -43,27 +24,6 @@ def step(self, action):
"""
raise NotImplementedError

def reset(self):
"""
Used to reset the world to an initial state.

Default, problem-agnostic, implementation of reset method,
using Webots-provided methods.

*Note that this works properly only with Webots versions >R2020b
and must be overridden with a custom reset method when using
earlier versions. It is backwards compatible due to the fact
that the new reset method gets overridden by whatever the user
has previously implemented, so an old supervisor can be migrated
easily to use this class.

:return: default observation provided by get_default_observation()
"""
self.simulationReset()
self.simulationResetPhysics()
super(Supervisor, self).step(int(self.getBasicTimeStep()))
return self.get_default_observation()

def get_default_observation(self):
"""
This method should be implemented to return a default/starting
Expand Down Expand Up @@ -115,3 +75,79 @@ def get_info(self):
information on each step, e.g. for debugging purposes.
"""
raise NotImplementedError

class SupervisorEnv(Supervisor, gym.Env, SupervisorBasicEnv):
"""
This class is the highest class except SupervisorBasicEnv in deepbots
class hierarchy, inheriting the Webots Supervisor controller, the basic
gym.Env, and the basic RL functions.

Refer to gym.Env documentation on how to implement a custom gym.Env
for additional functionality.

This class contains abstract methods that guide the development process
for users that want to implement a simple environment.

This class is not intended for user usage, but to provide a common
interface for all provided supervisor classes and make them
compatible with reinforcement learning agents that work with
the gym interface. Moreover, a problem-agnostic reset method is
provided. Please use any of the children supervisor classes to be
inherited by your own class, such as the RobotSupervisor class.
Nevertheless, advanced users can inherit this class to create
their own supervisor classes if they wish.
"""

def reset(self):
"""
Used to reset the world to an initial state.

Default, problem-agnostic, implementation of reset method,
using Webots-provided methods.

*Note that this works properly only with Webots versions >R2020b
and must be overridden with a custom reset method when using
earlier versions. It is backwards compatible due to the fact
that the new reset method gets overridden by whatever the user
has previously implemented, so an old supervisor can be migrated
easily to use this class.

:return: default observation provided by get_default_observation()
"""
self.simulationReset()
self.simulationResetPhysics()
super(Supervisor, self).step(int(self.getBasicTimeStep()))
return self.get_default_observation()



class SupervisorGoalEnv(Supervisor, gym.GoalEnv, SupervisorBasicEnv):
KelvinYang0320 marked this conversation as resolved.
Show resolved Hide resolved
"""
This class is just like SupervisorEnv, but it imposes gym.GoalEnv.

Refer to gym.GoalEnv documentation on how to implement a custom
gym.GoalEnv for additional functionality.
"""

def reset(self):
"""
Used to reset the world to an initial state and enforce that each
SupervisorGoalEnv uses a Goal-compatible observation space.

Default, problem-agnostic, implementation of reset method,
using Webots-provided methods.

*Note that this works properly only with Webots versions >R2020b
and must be overridden with a custom reset method when using
earlier versions. It is backwards compatible due to the fact
that the new reset method gets overridden by whatever the user
has previously implemented, so an old supervisor can be migrated
easily to use this class.

:return: default observation provided by get_default_observation()
"""
super().reset()
self.simulationReset()
self.simulationResetPhysics()
super(Supervisor, self).step(int(self.getBasicTimeStep()))
return self.get_default_observation()
KelvinYang0320 marked this conversation as resolved.
Show resolved Hide resolved