prediction.td_0_prediction ========================== .. py:module:: prediction.td_0_prediction Classes ------- .. autoapisummary:: prediction.td_0_prediction.TD0Prediction Module Contents --------------- .. py:class:: TD0Prediction(env: gymnasium.Env, policy: gridmind.policies.base_policy.BasePolicy, step_size: float = 0.1, discount_factor: float = 0.9, summary_dir: Optional[str] = None, write_summary: bool = True) Bases: :py:obj:`gridmind.algorithms.base_learning_algorithm.BaseLearningAlgorithm` Tabular TD(0) for estimating V_pi. Input: policy to be evaluated. The policy is supposed to be a function whose input is observation and output is action. .. py:attribute:: step_size :value: 0.1 .. py:attribute:: V .. py:attribute:: policy .. py:attribute:: discount_factor :value: 0.9 .. py:method:: _get_state_value_fn(force_functional_interface: bool = True) .. py:method:: _get_state_action_value_fn(force_functional_interface: bool = True) .. py:method:: _get_policy() .. py:method:: _train_steps(num_steps: int, prediction_only: bool, *args, **kwargs) :abstractmethod: .. py:method:: _train_episodes(num_episodes: int, prediction_only: bool = True) .. py:method:: set_policy(policy: gridmind.policies.base_policy.BasePolicy, **kwargs) :abstractmethod: