stochastic_start_epsilon_greedy_policy ====================================== .. py:module:: stochastic_start_epsilon_greedy_policy Classes ------- .. autoapisummary:: stochastic_start_epsilon_greedy_policy.StochasticStartEpsilonGreedyPolicy Module Contents --------------- .. py:class:: StochasticStartEpsilonGreedyPolicy(num_actions: int, action_space: Optional[gymnasium.spaces.space.Space] = None, epsilon: float = 0.1) Bases: :py:obj:`gridmind.policies.soft.base_soft_policy.BaseSoftPolicy` Epsilon-Greedy Policy is a specific implementation of an epsilon-soft policy. The epsilon-greedy policy is a specific type of action selection strategy where, with a probability ϵ, the agent selects a random action (exploration), and with a probability 1-ϵ, it selects the action with the highest estimated value (greedy action). .. py:attribute:: action_space :value: None .. py:attribute:: num_actions .. py:attribute:: epsilon :value: 0.1 .. py:attribute:: policy_dict .. py:method:: _get_random_action() .. py:method:: get_action(state) .. py:method:: get_actions(states) .. py:method:: _get_greedy_action(state) .. py:method:: convert_to_scalar(state) .. py:method:: get_action_prob(state, action) .. py:method:: get_all_action_probabilities(states) .. py:method:: update(state, action) .. py:method:: get_action_deterministic(state) .. py:method:: set_policy_dict(policy_dict) .. py:method:: get_policy_dict()