gym_acnportal.gym_acnsim.envs.base_env¶
This module contains an abstract gym environment that wraps an ACN-Sim Simulation.
Module Contents¶
Classes¶
Abstract base class meant to be inherited from to implement |
-
class
gym_acnportal.gym_acnsim.envs.base_env.BaseSimEnv(interface: Optional[GymTrainedInterface])¶ Bases:
gym.EnvAbstract base class meant to be inherited from to implement new ACN-Sim Environments.
- Subclasses must implement the following methods:
action_to_schedule observation_from_state reward_from_state done_from_state
Subclasses must also specify observation_space and action_space, either as class or instance variables.
Optionally, subclasses may implement info_from_state, which here returns an empty dict.
Subclasses may override __init__, step, and reset functions.
Currently, no render function is implemented, though this function is not required for internal functionality.
- Attributes:
- _interface (GymTrainedInterface): An interface to a simulation to be
stepped by this environment, or None. If None, an interface must be set later.
- _init_snapshot (GymTrainedInterface): A deep copy of the initial
interface, used for environment resets.
- _prev_interface (GymTrainedInterface): A deep copy of the interface
at the previous time step; used for calculating action rewards.
- _action (object): The action taken by the agent in this
agent-environment loop iteration.
- _schedule (Dict[str, List[number]]): Dictionary mapping
station ids to a schedule of pilot signals.
- _observation (np.ndarray): The observation given to the agent in
this agent-environment loop iteration.
- _done (object): An object representing whether or not the
execution of the environment is complete.
_info (object): An object that gives info about the environment.
-
_interface:Optional[GymTrainedInterface]¶
-
_init_snapshot:GymTrainedInterface¶
-
_prev_interface:GymTrainedInterface¶
-
_action:Optional[np.ndarray]¶
-
_schedule:Dict[str, List[float]]¶
-
_observation:Optional[np.ndarray]¶
-
_reward:Optional[float]¶
-
_done:Optional[bool]¶
-
_info:Optional[Dict[Any, Any]]¶
-
property
interface(self) → gym_acnportal.gym_acnsim.interfaces.GymTrainedInterface¶
-
property
prev_interface(self) → gym_acnportal.gym_acnsim.interfaces.GymTrainedInterface¶
-
property
action(self) → numpy.ndarray¶
-
property
schedule(self) → Dict[str, List[float]]¶
-
property
observation(self) → numpy.ndarray¶
-
property
reward(self) → float¶
-
property
done(self) → bool¶
-
property
info(self) → Dict[Any, Any]¶
-
update_state(self) → None¶ Update the state of the environment. Namely, the observation, reward, done, and info attributes of the environment.
- Returns:
None.
-
store_previous_state(self) → None¶ Store the previous state of the simulation in the _prev_interface environment attribute.
- Returns:
None.
-
step(self, action: numpy.ndarray) → Tuple[np.ndarray, float, bool, Dict[Any, Any]]¶ Step the simulation one timestep with an agent’s action.
Accepts an action and returns a tuple (observation, reward, done, info).
Implements gym.Env.step()
- Args:
action (object): an action provided by the agent
- Returns:
- observation (np.ndarray): agent’s observation of the current
environment
- reward (float)amount of reward returned after previous
action
- done (bool): whether the episode has ended, in which case
further step() calls will return undefined results
- info (dict): contains auxiliary diagnostic information
(helpful for debugging, and sometimes learning)
-
reset(self) → Dict[str, np.ndarray]¶ Resets the state of the simulation and returns an initial observation. Resetting is done by setting the interface to the simulation to an interface to the simulation in its initial state.
Implements gym.Env.reset()
- Returns:
observation (np.ndarray): the initial observation.
-
abstract
render(self, mode='human')¶ Renders the environment. Implements gym.Env.render().
-
abstract
action_to_schedule(self) → Dict[str, List[float]]¶ Convert an agent action to a schedule to be input to the simulator.
- Returns:
- schedule (Dict[str, List[float]]): Dictionary mapping
station ids to a schedule of pilot signals.
-
abstract
observation_from_state(self) → Dict[str, np.ndarray]¶ Construct an environment observation from the state of the simulator
- Returns:
- observation (Dict[str, np.ndarray]): an environment
observation generated from the simulation state
-
abstract
reward_from_state(self) → float¶ Calculate a reward from the state of the simulator
- Returns:
reward (float): a reward generated from the simulation state
-
abstract
done_from_state(self) → bool¶ Determine if the simulation is done from the state of the simulator
- Returns:
done (bool): True if the simulation is done, False if not
-
abstract
info_from_state(self) → Dict[Any, Any]¶ Give information about the environment using the state of the simulator
- Returns:
info (dict): dict of environment information