stochastic_start_epsilon_greedy_policy

Classes

StochasticStartEpsilonGreedyPolicy

Epsilon-Greedy Policy is a specific implementation of an epsilon-soft policy.

Module Contents

class stochastic_start_epsilon_greedy_policy.StochasticStartEpsilonGreedyPolicy(num_actions: int, action_space: gymnasium.spaces.space.Space | None = None, epsilon: float = 0.1)[source]

Bases: gridmind.policies.soft.base_soft_policy.BaseSoftPolicy

Epsilon-Greedy Policy is a specific implementation of an epsilon-soft policy. The epsilon-greedy policy is a specific type of action selection strategy where, with a probability ϵ, the agent selects a random action (exploration), and with a probability 1-ϵ, it selects the action with the highest estimated value (greedy action).

action_space = None[source]

num_actions[source]

epsilon = 0.1[source]

policy_dict[source]

_get_random_action()[source]

get_action(state)[source]

get_actions(states)[source]

_get_greedy_action(state)[source]

convert_to_scalar(state)[source]

get_action_prob(state, action)[source]

get_all_action_probabilities(states)[source]

update(state, action)[source]

get_action_deterministic(state)[source]

set_policy_dict(policy_dict)[source]

get_policy_dict()[source]