prediction.td_0_prediction
Classes
Tabular TD(0) for estimating V_pi. |
Module Contents
- class prediction.td_0_prediction.TD0Prediction(env: gymnasium.Env, policy: gridmind.policies.base_policy.BasePolicy, step_size: float = 0.1, discount_factor: float = 0.9, summary_dir: str | None = None, write_summary: bool = True)[source]
Bases:
gridmind.algorithms.base_learning_algorithm.BaseLearningAlgorithmTabular TD(0) for estimating V_pi. Input: policy to be evaluated. The policy is supposed to be a function whose input is observation and output is action.
- abstract set_policy(policy: gridmind.policies.base_policy.BasePolicy, **kwargs)[source]