gym_acnportal.gym_acnsim.envs.base_env

This module contains an abstract gym environment that wraps an ACN-Sim Simulation.

Module Contents

Classes

BaseSimEnv

Abstract base class meant to be inherited from to implement

class gym_acnportal.gym_acnsim.envs.base_env.BaseSimEnv(interface: Optional[GymTrainedInterface])

Bases: gym.Env

Abstract base class meant to be inherited from to implement new ACN-Sim Environments.

Subclasses must implement the following methods:

action_to_schedule observation_from_state reward_from_state done_from_state

Subclasses must also specify observation_space and action_space, either as class or instance variables.

Optionally, subclasses may implement info_from_state, which here returns an empty dict.

Subclasses may override __init__, step, and reset functions.

Currently, no render function is implemented, though this function is not required for internal functionality.

Attributes:
_interface (GymTrainedInterface): An interface to a simulation to be

stepped by this environment, or None. If None, an interface must be set later.

_init_snapshot (GymTrainedInterface): A deep copy of the initial

interface, used for environment resets.

_prev_interface (GymTrainedInterface): A deep copy of the interface

at the previous time step; used for calculating action rewards.

_action (object): The action taken by the agent in this

agent-environment loop iteration.

_schedule (Dict[str, List[number]]): Dictionary mapping

station ids to a schedule of pilot signals.

_observation (np.ndarray): The observation given to the agent in

this agent-environment loop iteration.

_done (object): An object representing whether or not the

execution of the environment is complete.

_info (object): An object that gives info about the environment.

_interface :Optional[GymTrainedInterface]
_init_snapshot :GymTrainedInterface
_prev_interface :GymTrainedInterface
_action :Optional[np.ndarray]
_schedule :Dict[str, List[float]]
_observation :Optional[np.ndarray]
_reward :Optional[float]
_done :Optional[bool]
_info :Optional[Dict[Any, Any]]
property interface(self)gym_acnportal.gym_acnsim.interfaces.GymTrainedInterface
property prev_interface(self)gym_acnportal.gym_acnsim.interfaces.GymTrainedInterface
property action(self)numpy.ndarray
property schedule(self)Dict[str, List[float]]
property observation(self)numpy.ndarray
property reward(self)float
property done(self)bool
property info(self)Dict[Any, Any]
update_state(self)None

Update the state of the environment. Namely, the observation, reward, done, and info attributes of the environment.

Returns:

None.

store_previous_state(self)None

Store the previous state of the simulation in the _prev_interface environment attribute.

Returns:

None.

step(self, action: numpy.ndarray)Tuple[np.ndarray, float, bool, Dict[Any, Any]]

Step the simulation one timestep with an agent’s action.

Accepts an action and returns a tuple (observation, reward, done, info).

Implements gym.Env.step()

Args:

action (object): an action provided by the agent

Returns:
observation (np.ndarray): agent’s observation of the current

environment

reward (float)amount of reward returned after previous

action

done (bool): whether the episode has ended, in which case

further step() calls will return undefined results

info (dict): contains auxiliary diagnostic information

(helpful for debugging, and sometimes learning)

reset(self)Dict[str, np.ndarray]

Resets the state of the simulation and returns an initial observation. Resetting is done by setting the interface to the simulation to an interface to the simulation in its initial state.

Implements gym.Env.reset()

Returns:

observation (np.ndarray): the initial observation.

abstract render(self, mode='human')

Renders the environment. Implements gym.Env.render().

abstract action_to_schedule(self)Dict[str, List[float]]

Convert an agent action to a schedule to be input to the simulator.

Returns:
schedule (Dict[str, List[float]]): Dictionary mapping

station ids to a schedule of pilot signals.

abstract observation_from_state(self)Dict[str, np.ndarray]

Construct an environment observation from the state of the simulator

Returns:
observation (Dict[str, np.ndarray]): an environment

observation generated from the simulation state

abstract reward_from_state(self)float

Calculate a reward from the state of the simulator

Returns:

reward (float): a reward generated from the simulation state

abstract done_from_state(self)bool

Determine if the simulation is done from the state of the simulator

Returns:

done (bool): True if the simulation is done, False if not

abstract info_from_state(self)Dict[Any, Any]

Give information about the environment using the state of the simulator

Returns:

info (dict): dict of environment information