Handbook of Learning and Approximate Dynamic ProgrammingJennie Si
|
From inside the book
Results 1-3 of 23
Page 6
... maximize the bottom line usually do not get funded by industry . The bottom line is different at NSF , but the same principle applies . Strategic thinking is essential in all of these sectors . All of us need to reassess our work ...
... maximize the bottom line usually do not get funded by industry . The bottom line is different at NSF , but the same principle applies . Strategic thinking is essential in all of these sectors . All of us need to reassess our work ...
Page 18
... maximize the expected value of the sum of future utility over all future time periods : MAXIMIZE ∞ k 1 ( t k ) ( 1.1 ) ( Σ ( + ) * U ( + R ) ) Already some questions of notation arise here . Here I am proposing that we should use the ...
... maximize the expected value of the sum of future utility over all future time periods : MAXIMIZE ∞ k 1 ( t k ) ( 1.1 ) ( Σ ( + ) * U ( + R ) ) Already some questions of notation arise here . Here I am proposing that we should use the ...
Page 51
... maximize the reward at a single location * . If the reward R ( x ) is stochastic , the goal is to maximize expected reward , but the expectation is taken with respect to the randomness in R at the single point r * , and not with respect ...
... maximize the reward at a single location * . If the reward R ( x ) is stochastic , the goal is to maximize expected reward , but the expectation is taken with respect to the randomness in R at the single point r * , and not with respect ...
Contents
Foreword | 1 |
Reinforcement Learning and Its Relationship to Supervised Learning | 47 |
ModelBased Adaptive Critic Designs | 65 |
Copyright | |
20 other sections not shown
Other editions - View all
Common terms and phrases
action network actor adaptive critic designs agent algorithm analysis angle applications approach approximate dynamic programming approximate LP backpropagation behavior Bellman equation BPTT chapter computational constraints control law control problems convergence cost critic network curse of dimensionality defined derivatives DHP neurocontroller direct NDP equation error estimate example Figure formulation function approximation fuzzy goal gradient helicopter Heuristic hierarchical IEEE Trans implemented improve initial input iteration learning algorithms learning rate linear programming load Lyapunov function Machine Learning Markov decision processes methods micro-alternator minimize module neural network node nonlinear operating optimal control optimal policy optimization problem output parameters Pareto optimal performance PI controller power system Proc Q-learning reinforcement learning reward robot Section simulation solve space stability stochastic structure supervised learning task techniques Theorem trajectory transition update Utility function value function variables vector voltage weights Werbos