gym_acnportal.gym_acnsim¶
Open AI Gym plugin for ACN-Sim. Provides several customizable environments for training reinforcement learning (RL) agents. See tutorial X for examples of usage.
Subpackages¶
gym_acnportal.gym_acnsim.envsgym_acnportal.gym_acnsim.tests
Submodules¶
Package Contents¶
Classes¶
Interface between OpenAI Environments and the ACN Simulation |
|
Interface between OpenAI Environments and the ACN Simulation |
|
Abstract base class meant to be inherited from to implement |
|
A simulator environment with customizable observations, action |
|
A simulator environment that subclasses CustomSimEnv, with |
Functions¶
|
A simulator environment with the following characteristics: |
|
A simulator environment with the same characteristics as the |
-
class
gym_acnportal.gym_acnsim.GymTrainedInterface¶ Bases:
acnportal.acnsim.InterfaceInterface between OpenAI Environments and the ACN Simulation Environment.
-
classmethod
from_interface(cls, interface: acnportal.acnsim.Interface) → gym_acnportal.gym_acnsim.interfaces.GymTrainedInterface¶ Generates an instance of this class from an Interface-like object. Note that the _simulator of the input interface is not copied into the new GymTrainedInterface-like object generated here, so changes from elsewhere to the _simulator will be reflected in this object’s _simulator.
- Args:
interface (Interface):
- Returns:
GymTrainedInterface:
-
property
station_ids(self) → List[str]¶ Return a list of space ids of stations the in the network.
- Returns:
List[str]: List of station ids in the network.
-
property
active_station_ids(self) → List[str]¶ Returns a list of active EVSE station ids for use by the algorithm.
- Returns:
- List[str]: List of EVSE station ids with an EV plugged in
that is not finished charging.
-
property
is_done(self) → bool¶ Returns if the simulation is complete (i.e. event queue is empty).
- Returns:
bool: True if simulation is complete.
-
property
charging_rates(self) → numpy.ndarray¶ Returns the charging_rates of the simulator at all times.
- Returns:
- np.ndarray: numpy array of all charging rates. Each row
represents the charging rates of a station; each column represents the charging rates at an iteration.
-
is_feasible_evse(self, load_currents: Dict[str, List[float]]) → bool¶ Return if each EVSE in load_currents can accept the pilots assigned to it.
- Args:
- load_currents (Dict[str, List[number]]): Dictionary mapping
load_ids to schedules of charging rates.
- Returns: bool: True if all pilots are valid for the EVSEs to
which they are sent.
-
is_feasible(self, load_currents: Dict[str, List[float]], linear: bool = False, violation_tolerance: Optional[float] = None, relative_tolerance: Optional[float] = None) → bool¶ Overrides Interface.is_feasible with extra feasibility checks. These include:
Checking for stations in a schedule but not in the network.
Checking that the schedule doesn’t violate any constraints on
EVSE charging rates.
-
last_energy_delivered(self) → float¶ Return the actual energy delivered in the last period, in amp-periods.
TODO: This is known to produce a warning in acnportal 0.2.2
- Returns:
- float: Total energy delivered in the last period, in
amp-periods.
-
current_constraint_currents(self, input_schedule: object) → object¶ TODO Args:
input_schedule:
Returns:
-
classmethod
-
class
gym_acnportal.gym_acnsim.GymTrainingInterface¶ Bases:
gym_acnportal.gym_acnsim.interfaces.GymTrainedInterface- Interface between OpenAI Environments and the ACN Simulation
Environment.
This class of interface facilitates training by allowing an agent to step the Simulator by a single iteration.
-
step(self, new_schedule: Dict[str, List[float]], force_feasibility: bool = True) → Tuple[bool, bool]¶ Step the simulation using the input new_schedule until the simulator requires a new charging schedule. If the provided schedule is infeasible, steps the simulation only if force_feasibility is False, otherwise doesn’t step the simulation.
- Args:
new_schedule (Dict[str, List[float]]): Dictionary mapping station ids to a schedule of pilot signals. force_feasibility (bool): If True, do not allow an
- Returns:
bool: True if the simulation is completed bool: True if the schedule was feasible
- Warns:
- UserWarning: If the length of the new schedule is less than
the Simulator’s max_recompute parameter. This warning is raised because stepping the Simulator with a schedule of length less than max_recompute could cause the pilot signals to be updated with 0’s after the schedule runs out of entries.
-
class
gym_acnportal.gym_acnsim.BaseSimEnv(interface: Optional[GymTrainedInterface])¶ Bases:
gym.EnvAbstract base class meant to be inherited from to implement new ACN-Sim Environments.
- Subclasses must implement the following methods:
action_to_schedule observation_from_state reward_from_state done_from_state
Subclasses must also specify observation_space and action_space, either as class or instance variables.
Optionally, subclasses may implement info_from_state, which here returns an empty dict.
Subclasses may override __init__, step, and reset functions.
Currently, no render function is implemented, though this function is not required for internal functionality.
- Attributes:
- _interface (GymTrainedInterface): An interface to a simulation to be
stepped by this environment, or None. If None, an interface must be set later.
- _init_snapshot (GymTrainedInterface): A deep copy of the initial
interface, used for environment resets.
- _prev_interface (GymTrainedInterface): A deep copy of the interface
at the previous time step; used for calculating action rewards.
- _action (object): The action taken by the agent in this
agent-environment loop iteration.
- _schedule (Dict[str, List[number]]): Dictionary mapping
station ids to a schedule of pilot signals.
- _observation (np.ndarray): The observation given to the agent in
this agent-environment loop iteration.
- _done (object): An object representing whether or not the
execution of the environment is complete.
_info (object): An object that gives info about the environment.
-
_interface:Optional[GymTrainedInterface]¶
-
_init_snapshot:GymTrainedInterface¶
-
_prev_interface:GymTrainedInterface¶
-
_action:Optional[np.ndarray]¶
-
_schedule:Dict[str, List[float]]¶
-
_observation:Optional[np.ndarray]¶
-
_reward:Optional[float]¶
-
_done:Optional[bool]¶
-
_info:Optional[Dict[Any, Any]]¶
-
property
interface(self) → gym_acnportal.gym_acnsim.interfaces.GymTrainedInterface¶
-
property
prev_interface(self) → gym_acnportal.gym_acnsim.interfaces.GymTrainedInterface¶
-
property
action(self) → numpy.ndarray¶
-
property
schedule(self) → Dict[str, List[float]]¶
-
property
observation(self) → numpy.ndarray¶
-
property
reward(self) → float¶
-
property
done(self) → bool¶
-
property
info(self) → Dict[Any, Any]¶
-
update_state(self) → None¶ Update the state of the environment. Namely, the observation, reward, done, and info attributes of the environment.
- Returns:
None.
-
store_previous_state(self) → None¶ Store the previous state of the simulation in the _prev_interface environment attribute.
- Returns:
None.
-
step(self, action: numpy.ndarray) → Tuple[np.ndarray, float, bool, Dict[Any, Any]]¶ Step the simulation one timestep with an agent’s action.
Accepts an action and returns a tuple (observation, reward, done, info).
Implements gym.Env.step()
- Args:
action (object): an action provided by the agent
- Returns:
- observation (np.ndarray): agent’s observation of the current
environment
- reward (float)amount of reward returned after previous
action
- done (bool): whether the episode has ended, in which case
further step() calls will return undefined results
- info (dict): contains auxiliary diagnostic information
(helpful for debugging, and sometimes learning)
-
reset(self) → Dict[str, np.ndarray]¶ Resets the state of the simulation and returns an initial observation. Resetting is done by setting the interface to the simulation to an interface to the simulation in its initial state.
Implements gym.Env.reset()
- Returns:
observation (np.ndarray): the initial observation.
-
abstract
render(self, mode='human')¶ Renders the environment. Implements gym.Env.render().
-
abstract
action_to_schedule(self) → Dict[str, List[float]]¶ Convert an agent action to a schedule to be input to the simulator.
- Returns:
- schedule (Dict[str, List[float]]): Dictionary mapping
station ids to a schedule of pilot signals.
-
abstract
observation_from_state(self) → Dict[str, np.ndarray]¶ Construct an environment observation from the state of the simulator
- Returns:
- observation (Dict[str, np.ndarray]): an environment
observation generated from the simulation state
-
abstract
reward_from_state(self) → float¶ Calculate a reward from the state of the simulator
- Returns:
reward (float): a reward generated from the simulation state
-
abstract
done_from_state(self) → bool¶ Determine if the simulation is done from the state of the simulator
- Returns:
done (bool): True if the simulation is done, False if not
-
abstract
info_from_state(self) → Dict[Any, Any]¶ Give information about the environment using the state of the simulator
- Returns:
info (dict): dict of environment information
-
class
gym_acnportal.gym_acnsim.CustomSimEnv(interface: Optional[GymTrainedInterface], observation_objects: List[SimObservation], action_object: gym_acnportal.gym_acnsim.envs.action_spaces.SimAction, reward_functions: List[Callable[[BaseSimEnv], float]])¶ Bases:
gym_acnportal.gym_acnsim.envs.base_env.BaseSimEnvA simulator environment with customizable observations, action spaces, and rewards.
Observations are specified as objects, where each object specifies a function to generate a space from a simulation interface and a function to generate an observation from a simulation interface.
Action spaces are specified as functions that generate a space from a simulation interface.
Rewards are specified as functions that generate a number (reward) from an environment.
Users may define their own objects/functions to input to this environment, use the objects/functions defined in the gym_acnsim package, or use an environment factory function defined in the sim_prototype_env module.
-
observation_objects:List[SimObservation]¶
-
observation_space:spaces.Dict¶
-
action_object:SimAction¶
-
action_space:spaces.Space¶
-
reward_functions:List[Callable[[BaseSimEnv], float]]¶
-
property
interface(self) → gym_acnportal.gym_acnsim.interfaces.GymTrainedInterface¶ Get the current Interface of the environment.
- Returns:
GymTrainedInterface: The current Interface of the environment.
-
abstract
render(self, mode='human')¶ Renders the environment. Implements gym.Env.render().
-
action_to_schedule(self) → Dict[str, List[float]]¶ Convert an agent action to a schedule to be input to the simulator.
- Returns:
- schedule (Dict[str, List[float]]): Dictionary mapping
station ids to a schedule of pilot signals.
-
observation_from_state(self) → Dict[str, np.ndarray]¶ Construct an environment observation from the state of the simulator using the environment’s observation construction functions.
- Returns:
- observation (Dict[str, np.ndarray]): An environment
observation generated from the simulation state
-
reward_from_state(self) → float¶ Calculate a reward from the state of the simulator
- Returns:
- reward (float): a reward generated from the simulation
state
-
done_from_state(self) → bool¶ Determine if the simulation is done from the state of the simulator
- Returns:
done (bool): True if the simulation is done, False if not
-
info_from_state(self) → Dict[Any, Any]¶ Give information about the environment using the state of the simulator. In this case, all the info about the simulator is given by returning a dict containing the simulator’s interface.
- Returns:
- info (Dict[str, GymTrainedInterface]): The interface between
the environment and Simulator.
-
-
class
gym_acnportal.gym_acnsim.RebuildingEnv(interface: Optional[GymTrainedInterface], observation_objects: List[SimObservation], action_object: gym_acnportal.gym_acnsim.envs.action_spaces.SimAction, reward_functions: List[Callable[[BaseSimEnv], float]], interface_generating_function: Optional[Callable[], GymTrainedInterface]] = None)¶ Bases:
gym_acnportal.gym_acnsim.envs.custom_envs.CustomSimEnvA simulator environment that subclasses CustomSimEnv, with the extra property that the entire simulation is rebuilt within the environment when __init__ or reset are called
This is especially useful if the network or event queue have stochastic elements.
-
classmethod
from_custom_sim_env(cls, env: gym_acnportal.gym_acnsim.envs.custom_envs.CustomSimEnv, interface_generating_function: Optional[Callable[], GymTrainedInterface]] = None) → gym_acnportal.gym_acnsim.envs.custom_envs.RebuildingEnv¶
-
reset(self) → Dict[str, np.ndarray]¶ Resets the state of the simulation and returns an initial observation. Resetting is done by setting the interface to the simulation to an interface to the simulation in its initial state.
- Returns:
observation (np.ndarray): the initial observation.
-
abstract
render(self, mode='human')¶ Renders the environment. Implements gym.Env.render().
-
classmethod
-
gym_acnportal.gym_acnsim.make_default_sim_env(interface: Optional[GymTrainedInterface] = None) → gym_acnportal.gym_acnsim.envs.custom_envs.CustomSimEnv¶ A simulator environment with the following characteristics:
The action and observation spaces are continuous.
An action in this environment is a pilot signal for each EVSE, within the minimum and maximum EVSE rates.
An observation is a dict consisting of fields (times are 1-indexed in the observations):
- arrivals: arrival time of the EV at each EVSE (or 0 if there’s
no EV plugged in)
- departures: departure time of the EV at each EVSE (or 0 if
there’s no EV plugged in)
- demand: energy demand of the EV at each EVSE (unoccupied
EVSEs have demand 0)
constraint_matrix: matrix of aggregate current coefficients magnitudes: magnitude vector constraining aggregate currents timestep: timestep of the simulation
- The reward is calculated as follows:
- If no constraints (on the network or on the EVSEs) were
violated by the action,
- a reward equal to the total charge delivered (in A) is
returned
- If any constraint violation occurred, a negative reward equal
to the magnitude of the violation is returned.
Network constraint violations are scaled by the number of EVs Finally, a user-input reward function is added to the total
reward.
The simulation is considered done if the event queue is empty.
-
gym_acnportal.gym_acnsim.make_rebuilding_default_sim_env(interface_generating_function: Optional[Callable[], GymTrainedInterface]]) → gym_acnportal.gym_acnsim.envs.custom_envs.RebuildingEnv¶ A simulator environment with the same characteristics as the environment returned by make_default_sim_env except on every reset, the simulation is completely rebuilt using interface_generating_function.
See make_default_sim_env for more info.
-
gym_acnportal.gym_acnsim.default_observation_objects:List[SimObservation]¶
-
gym_acnportal.gym_acnsim.default_action_object:SimAction¶
-
gym_acnportal.gym_acnsim.default_reward_functions:List[Callable[[BaseSimEnv], float]]¶
-
gym_acnportal.gym_acnsim.all_envs:List[EnvSpec]¶
-
gym_acnportal.gym_acnsim.env_ids¶
-
gym_acnportal.gym_acnsim.gym_env_dict:Dict[str, str]¶