stochastic_start_epsilon_greedy_policy

Classes

StochasticStartEpsilonGreedyPolicy

Epsilon-Greedy Policy is a specific implementation of an epsilon-soft policy.

Module Contents

class stochastic_start_epsilon_greedy_policy.StochasticStartEpsilonGreedyPolicy(num_actions: int, action_space: gymnasium.spaces.space.Space | None = None, epsilon: float = 0.1)[source]

Bases: gridmind.policies.soft.base_soft_policy.BaseSoftPolicy

Epsilon-Greedy Policy is a specific implementation of an epsilon-soft policy. The epsilon-greedy policy is a specific type of action selection strategy where, with a probability ϵ, the agent selects a random action (exploration), and with a probability 1-ϵ, it selects the action with the highest estimated value (greedy action).

action_space = None[source]
num_actions[source]
epsilon = 0.1[source]
policy_dict[source]
_get_random_action()[source]
get_action(state)[source]
get_actions(states)[source]
_get_greedy_action(state)[source]
convert_to_scalar(state)[source]
get_action_prob(state, action)[source]
get_all_action_probabilities(states)[source]
update(state, action)[source]
get_action_deterministic(state)[source]
set_policy_dict(policy_dict)[source]
get_policy_dict()[source]