Handbook of Learning and Approximate Dynamic ProgrammingJennie Si
|
From inside the book
Results 1-3 of 78
Page 247
... iteration portion of the 0 - LSPE method with y to the approximate value iteration ( 9.18 ) with weights w ( i ) can view 0 - LSPE as the approximate value iteration method ( 9.18 ) , plus noise that asymptotically tends to 0 . Note ...
... iteration portion of the 0 - LSPE method with y to the approximate value iteration ( 9.18 ) with weights w ( i ) can view 0 - LSPE as the approximate value iteration method ( 9.18 ) , plus noise that asymptotically tends to 0 . Note ...
Page 325
... iteration algorithms for determining the optimal policy can be easily de- veloped by combining Lemma 12.5.1 and Theorem 12.5.1 . Roughly speaking , at the kth step with policy Lk , we set the policy for the next step ( the ( k + 1 ) th ...
... iteration algorithms for determining the optimal policy can be easily de- veloped by combining Lemma 12.5.1 and Theorem 12.5.1 . Roughly speaking , at the kth step with policy Lk , we set the policy for the next step ( the ( k + 1 ) th ...
Page 326
... iterations . The on - line policy iteration approach is a counterpart of the online gradient based optimization approach presented in Section 12.4 ; the latter applies to parameterized systems and the former to systems within the MDP ...
... iterations . The on - line policy iteration approach is a counterpart of the online gradient based optimization approach presented in Section 12.4 ; the latter applies to parameterized systems and the former to systems within the MDP ...
Contents
Foreword | 1 |
Reinforcement Learning and Its Relationship to Supervised Learning | 47 |
ModelBased Adaptive Critic Designs | 65 |
Copyright | |
20 other sections not shown
Other editions - View all
Common terms and phrases
action network actor adaptive critic designs agent algorithm analysis angle applications approach approximate dynamic programming approximate LP backpropagation behavior Bellman equation BPTT chapter computational constraints control law control problems convergence cost critic network curse of dimensionality defined derivatives DHP neurocontroller direct NDP equation error estimate example Figure formulation function approximation fuzzy goal gradient helicopter Heuristic hierarchical IEEE Trans implemented improve initial input iteration learning algorithms learning rate linear programming load Lyapunov function Machine Learning Markov decision processes methods micro-alternator minimize module neural network node nonlinear operating optimal control optimal policy optimization problem output parameters Pareto optimal performance PI controller power system Proc Q-learning reinforcement learning reward robot Section simulation solve space stability stochastic structure supervised learning task techniques Theorem trajectory transition update Utility function value function variables vector voltage weights Werbos