强化学习与最优控制pdf下载pdf下载

强化学习与最优控制百度网盘pdf下载

作者:
简介:本篇主要提供强化学习与最优控制pdf下载
出版社:文轩网旗舰店
出版时间:2020-06
pdf下载价格:0.00¥

免费下载


书籍下载


内容介绍

作  者:(美)德梅萃·P.博塞卡斯 著
定  价:149
出 版 社:清华大学出版社
出版日期:2020年06月01日
页  数:373
装  帧:平装
ISBN:9787302540328
主编推荐
"Dimitri P. Bertseka,美国MIT终身教授,美国国家工程院院士,清华大学复杂与网络化系统研究中心客座教授,电气工程与计算机科学领域GUO际作者,著有《非线性规划》《网络优化》《凸优化》等十几本教材和专著。本书的目的是考虑大型且具有挑战性的多阶段决策问题,这些问题原则上可以通过动态规划和优控制来解决,但它们的解决方案在计算上是难以处理的。本书讨论依赖于近似的解决方法,以产生具有足够性能的次优策略。这些方法统称为增强学习,也可以叫做近似动态规划和神经动态规划等。 本书的主题产生于等
目录
1. Exact Dynamic Programming
1.1. Deterministic Dynamic Programming
1.1.1. Deterministic Problems
1.1.2. The Dynamic Programming Algorithm
1.1.3. Appromation in Value Space
1.2. Stochastic Dynamic Programming
1.3. Examples, Variations, and Simplifications
1.3.1. Deterministic Shortest Path Problems
1.3.2. Discrete Deterministic Optimization
1.3.3. Problems with a Termination State
1.3.4. Forecasts
1.3.5. Problems with Uncontrollable State Components
1.3.6. Partial State Information and Belief States
1.3.7. Linear Quadratic Optimal Control
1.3.8. Systems with Unknown Parameters - Adaptive Control
1.4. Reinforcement Learning and Optimal Control - Some Terminology
1.5. Notes and Sources
2. Appromation in Value Space
2.1. Appromation Approaches in Reinforcement Learning
2.1.1. General Issues of Appromation in Value Space
2.1.2. Off-Line and On-Line Methods
2.1.3. Model-Based Simplification of the Lookahead Minimization
2.1.4. Model-Free off-Line Q-Factor Appromation
2.1.5. Appromation in Policy Space on Top of Appromation in Value Space
2.1.6. When is Appromation in Value Space Effective?
2.2. ltistep Lookahead
2.2.1. ltistep Lookahead and Rolling Horizon
2.2.2. ltistep Lookahead and Deterministic Problems
2.3. Problem Appromation
2.3.1. Enforced Decomition
2.3.2. Probabilistic Appromation - Certainty Equivalent Control
2.4. Rollout and the Policy Improvement Principle
2.4.1. On-Line Rollout for Deterministic Discrete Optimization
2.4.2. Stochastic Rollout and Monte Carlo Tree Search
2.4.3. Rollout with an Expert
2.5. On-Line Rollout for Deterministic Infinite-Spaces Problems Optimization Heuristics
2.5.1. Model Predictive Control
2.5.2. Target Tubes and the Constrained Controllability Condition
2.5.3. Variants of Model Predictive Control
2.6. Notes and Sources
3. Parametric Appromation
3.1. Appromation Architectures
3.1.1. Linear and Nonlinear Feature-Based Architectures
3.1.2. Training of Linear and Nonlinear Architectures
3.1.3. Incremental Gradient and Newton Methods
3.2. Neural Networks
3.2.1. Training of Neural Networks
3.2.2. ltilayer and Deep Neural Networks
3.3. Sequential Dynamic Programming Appromation
3.4. Q-Factor Parametric Appromation
3.5. Parametric Appromation in Policy Space by Classification
3.6. Notes and Sources
4. Infinite Horizon Dynamic Programming
4.1. An Overview of Infinite Horizon Problems
4.2. Stochastic Shortest Path Problems
4.3. Discounted Problems
4.4. Semi-Markov Discounted Problems
4.5. Asynchronous Distributed Value Iteration
4.6. Policy Iteration
4.6.1. Exact Policy Iteration
4.6.2. Optimistic and ltistep Lookahead Policy Iteration
4.6.3. Policy Iteration for Q-factors
4.7. Notes and Sources
4.8. Appendix: Mathematical Analysis
4.8.1. Proofs for Stochastic Shortest Path Problems
4.8.2. Proofs for Discounted Problems
4.8.3. Convergence of Exact and Optimistic Policy Iteration
5. Infinite Horizon Reinforcement Learning
5.1. Appromation in Value Space - Performance Bounds
5.1.1. Limited Lookahead
5.1.2. Rollout and Appromate Policy Improvement
5.1.3. Appromate Policy Iteration
5.2. Fitted Value Iteration
5.3. Silation-Based Policy Iteration with Parametric Appromation
5.3.1. Self-Learning and Actor-Critic Methods
5.3.2. Model-Based Variant of a Critic-Only Method
5.3.3. Model-Free Variant of a Critic-Only Method
5.3.4. Implementation Issues of Parametric Policy Iteration
5.3.5. Convergence Issues of Parametric Policy Iteration Oscillations
5.4. Q-Learning
5.4.1. Optimistic Policy Iteration with Parametric Q-Factor Appromation - SARSA and DQN
5.5. Additional Methods - Temporal Differences
……
内容简介
本书的主要内容包括:章动态规划的求解;第2章值空间的逼近;第3章参数逼近;第4章无限时间动态规划;第5章无限时间强化学习;第6章集结技术。通过本书读者可以较为全面地了解动态规划、近似动态规划和强化学理论框架、主流算法的工作原理和新发展。本书可用作人工智能或系统与控制科学等相关专业的高年级本科生或研究生的教材,也适合开展相关研究工作的专业技术人员作为参考用书。
作者简介
(美)德梅萃·P.博塞卡斯 著
Dimitri P. Bertseka,美国MIT终身教授,美国国家工程院院士,清华大学复杂与网络化系统研究中心客座教授。电气工程与计算机科学领域GUO际作者,著有《非线性规划》《网络优化》《凸优化》等十几本教材和专著。