Reinforcement learning is a learning problem in which an actor has to behave optimally in its environment. Deep learning methods, on the other hand, are a subclass of representation learning, which in turn focuses on extracting the necessary features for the task (e.g. classification or detection). As such, they serve as powerful function approximators. The combination of those two paradigm results in deep reinforcement learning.

This thesis gives an overview of the recent advancement in the field. The results are divided into two broad research directions: value-based and policy-based approaches. This research shows several algorithms from those directions and how they perform. Finally, multiple open research questions are addressed and new research directions are proposed.

Extrait

Inhaltsverzeichnis (Table of Contents)

Abstract
Zusammenfassung
List of Abbreviations
List of Figures
1 Introduction
2 Reinforcement Learning
2.1 Markov Decision Process (MDP)
2.2 Value-Based Methods
2.2.1 Dynamic Programming (DP)
2.2.2 Monte Carlo (MC)
2.2.3 Temporal Difference (TD)
2.3 Policy-Based Methods
2.3.1 Policy Iteration
2.3.2 Policy Gradient
3 Deep Learning
3.1 Neural Networks
3.1.1 Convolutional Neural Network (CNN)
3.1.2 Recurrent Neural Network (RNN)
3.2 Deep Reinforcement Learning (DRL)
3.2.1 Deep Q-Network (DQN)
3.2.2 Deep Deterministic Policy Gradient (D-DPG)
3.2.3 Asynchronous Advantage Actor-Critic (A3C)
3.2.4 Trust Region Policy Optimization (TRPO)
3.2.5 Distributional Bellman Equation
4 Applications of DRL
4.1 Game Playing
4.1.1 Self-Play Reinforcement Learning
4.1.2 Monte Carlo Tree Search (MCTS)
4.1.3 Multiplayer Online Battle Arena (MOBA)
4.2 Robotics
4.3 Finance
5 Conclusion

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This thesis provides a comprehensive overview of recent advancements in deep reinforcement learning (DRL). It explores the integration of deep learning methods with reinforcement learning, highlighting the key algorithms and their performance in various domains. The research delves into both value-based and policy-based approaches, examining their strengths and limitations.

Integration of deep learning and reinforcement learning
Value-based and policy-based DRL algorithms
Applications of DRL in game playing, robotics, and finance
Open research questions and future directions in DRL

Zusammenfassung der Kapitel (Chapter Summaries)

Chapter 1 introduces the concept of DRL, highlighting its significance and potential applications. Chapter 2 provides a foundational understanding of reinforcement learning, covering key concepts like Markov Decision Processes (MDPs), value-based methods (Dynamic Programming, Monte Carlo, and Temporal Difference learning), and policy-based methods (policy iteration and policy gradient). Chapter 3 delves into deep learning, focusing on neural networks, including convolutional and recurrent neural networks, and their application in DRL. Key DRL algorithms, such as Deep Q-Networks (DQN), Deep Deterministic Policy Gradient (D-DPG), Asynchronous Advantage Actor-Critic (A3C), Trust Region Policy Optimization (TRPO), and the Distributional Bellman Equation are discussed. Chapter 4 explores practical applications of DRL in game playing, robotics, and finance. Finally, Chapter 5 concludes by summarizing the research findings, highlighting open research questions, and proposing future directions for DRL.

Schlüsselwörter (Keywords)

Deep reinforcement learning, deep learning, reinforcement learning, neural networks, value-based methods, policy-based methods, game playing, robotics, finance, open research questions, future directions.

Frequently Asked Questions

What is Deep Reinforcement Learning (DRL)?

DRL is the combination of reinforcement learning (where an agent learns by interacting with an environment) and deep learning (using neural networks as function approximators).

What is the difference between value-based and policy-based methods?

Value-based methods (like DQN) learn to estimate the value of actions, while policy-based methods (like Policy Gradient) directly learn the optimal strategy (policy) for the agent.

What is a Deep Q-Network (DQN)?

DQN is a landmark algorithm that uses a deep neural network to approximate the Q-value function, allowing agents to play complex games like Atari from raw pixels.

What are common applications of DRL?

DRL is widely used in game playing (AlphaGo), robotics (motor control), and finance (algorithmic trading and portfolio optimization).

What is an Actor-Critic algorithm?

It is a hybrid method where the "Actor" proposes actions (policy) and the "Critic" evaluates them (value function) to improve learning stability.

What is a Markov Decision Process (MDP)?

An MDP is a mathematical framework used to model decision-making situations where outcomes are partly random and partly under the control of a decision-maker.

Fin de l'extrait de 78 pages - haut de page

Résumé des informations

Titre: A Review of Recent Advancements in Deep Reinforcement Learning
Université: University of Duisburg-Essen
Note: 1.0
Auteur: Artur Sahakjan (Auteur)
Année de publication: 2018
Pages: 78
N° de catalogue: V432230
ISBN (ebook): 9783668765009
ISBN (Livre): 9783668765016
Langue: anglais
mots-clé: Machine learning artificial intelligence maschinelles lernen künstliche intelligenz
Sécurité des produits: GRIN Publishing GmbH

Citation du texte: Artur Sahakjan (Auteur), 2018, A Review of Recent Advancements in Deep Reinforcement Learning, Munich, GRIN Verlag, https://www.grin.com/document/432230

A Review of Recent Advancements in Deep Reinforcement Learning