A survey of multi-objective sequential decision-making
- Roijers, Diederik, Vamplew, Peter, Whiteson, Shimon, Dazeley, Richard
- Authors: Roijers, Diederik , Vamplew, Peter , Whiteson, Shimon , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Journal of Artificial Intelligence Research Vol. 48, no. (2013), p. 67-113
- Full Text:
- Reviewed:
- Description: Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work. © 2013 AI Access Foundation.
- Description: C1
- Authors: Roijers, Diederik , Vamplew, Peter , Whiteson, Shimon , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Journal of Artificial Intelligence Research Vol. 48, no. (2013), p. 67-113
- Full Text:
- Reviewed:
- Description: Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work. © 2013 AI Access Foundation.
- Description: C1
Scalar reward is not enough : a response to Silver, Singh, Precup and Sutton (2021)
- Vamplew, Peter, Smith, Benjamin, Källström, Johan, Ramos, Gabriel, Rădulescu, Roxana, Roijers, Diederik, Hayes, Conor, Heintz, Fredrik, Mannion, Patrick, Libin, Pieter, Dazeley, Richard, Foale, Cameron
- Authors: Vamplew, Peter , Smith, Benjamin , Källström, Johan , Ramos, Gabriel , Rădulescu, Roxana , Roijers, Diederik , Hayes, Conor , Heintz, Fredrik , Mannion, Patrick , Libin, Pieter , Dazeley, Richard , Foale, Cameron
- Date: 2022
- Type: Text , Journal article
- Relation: Autonomous Agents and Multi-Agent Systems Vol. 36, no. 2 (2022), p.
- Full Text:
- Reviewed:
- Description: The recent paper “Reward is Enough” by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, this type of reward is insufficient for the development of human-aligned artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour. © 2022, The Author(s).
- Authors: Vamplew, Peter , Smith, Benjamin , Källström, Johan , Ramos, Gabriel , Rădulescu, Roxana , Roijers, Diederik , Hayes, Conor , Heintz, Fredrik , Mannion, Patrick , Libin, Pieter , Dazeley, Richard , Foale, Cameron
- Date: 2022
- Type: Text , Journal article
- Relation: Autonomous Agents and Multi-Agent Systems Vol. 36, no. 2 (2022), p.
- Full Text:
- Reviewed:
- Description: The recent paper “Reward is Enough” by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, this type of reward is insufficient for the development of human-aligned artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour. © 2022, The Author(s).
Scalar reward is not enough JAAMAS Track
- Vamplew, Peter, Smith, Benjamin, Källström, Johan, Ramos, Gabriel, Rădulescu, Roxana, Roijers, Diederik, Hayes, Conor, Heintz, Fredrik, Mannion, Patrick, Libin, Pieter, Dazeley, Richard, Foale, Cameron
- Authors: Vamplew, Peter , Smith, Benjamin , Källström, Johan , Ramos, Gabriel , Rădulescu, Roxana , Roijers, Diederik , Hayes, Conor , Heintz, Fredrik , Mannion, Patrick , Libin, Pieter , Dazeley, Richard , Foale, Cameron
- Date: 2023
- Type: Text , Conference paper
- Relation: 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023, London, 29 May to 2 June 2023, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, Vol. 2023-May, p. 839-841
- Full Text: false
- Reviewed:
- Description: Silver et al. [14] posit that scalar reward maximisation is sufficient to underpin all intelligence and provides a suitable basis for artificial general intelligence (AGI). This extended abstract summarises the counter-argument from our JAAMAS paper[19]. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
A practical guide to multi-objective reinforcement learning and planning
- Hayes, Conor, Rădulescu, Roxana, Bargiacchi, Eugenio, Källström, Johan, Macfarlane, Matthew, Reymond, Mathieu, Verstraeten, Timothy, Zintgraf, Luisa, Dazeley, Richard, Heintz, Fredrik, Howley, Enda, Irissappane, Athirai, Mannion, Patrick, Nowé, Ann, Ramos, Gabriel, Restelli, Marcello, Vamplew, Peter, Roijers, Diederik
- Authors: Hayes, Conor , Rădulescu, Roxana , Bargiacchi, Eugenio , Källström, Johan , Macfarlane, Matthew , Reymond, Mathieu , Verstraeten, Timothy , Zintgraf, Luisa , Dazeley, Richard , Heintz, Fredrik , Howley, Enda , Irissappane, Athirai , Mannion, Patrick , Nowé, Ann , Ramos, Gabriel , Restelli, Marcello , Vamplew, Peter , Roijers, Diederik
- Date: 2022
- Type: Text , Journal article
- Relation: Autonomous Agents and Multi-Agent Systems Vol. 36, no. 1 (2022), p.
- Full Text:
- Reviewed:
- Description: Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s).
- Authors: Hayes, Conor , Rădulescu, Roxana , Bargiacchi, Eugenio , Källström, Johan , Macfarlane, Matthew , Reymond, Mathieu , Verstraeten, Timothy , Zintgraf, Luisa , Dazeley, Richard , Heintz, Fredrik , Howley, Enda , Irissappane, Athirai , Mannion, Patrick , Nowé, Ann , Ramos, Gabriel , Restelli, Marcello , Vamplew, Peter , Roijers, Diederik
- Date: 2022
- Type: Text , Journal article
- Relation: Autonomous Agents and Multi-Agent Systems Vol. 36, no. 1 (2022), p.
- Full Text:
- Reviewed:
- Description: Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s).
A brief guide to multi-objective reinforcement learning and planning JAAMAS track
- Hayes, Conor, Bargiacchi, Eugenio, Källström, Johan, Macfarlane, Matthew, Reymond, Mathieu, Verstraeten, Timothy, Zintgraf, Luisa, Dazeley, Richard, Heintz, Fredrik, Howley, Enda, Irissappane, Athirai, Mannion, Patrick, Nowé, Ann, Ramos, Gabriel, Restelli, Marcello, Vamplew, Peter, Roijers, Diederik
- Authors: Hayes, Conor , Bargiacchi, Eugenio , Källström, Johan , Macfarlane, Matthew , Reymond, Mathieu , Verstraeten, Timothy , Zintgraf, Luisa , Dazeley, Richard , Heintz, Fredrik , Howley, Enda , Irissappane, Athirai , Mannion, Patrick , Nowé, Ann , Ramos, Gabriel , Restelli, Marcello , Vamplew, Peter , Roijers, Diederik
- Date: 2023
- Type: Text , Conference paper
- Relation: 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023, London, 29 May to 2 June 2023, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS Vol. 2023-May, p. 1988-1990
- Full Text:
- Reviewed:
- Description: Real-world sequential decision-making tasks are usually complex, and require trade-offs between multiple - often conflicting - objectives. However, the majority of research in reinforcement learning (RL) and decision-theoretic planning assumes a single objective, or that multiple objectives can be handled via a predefined weighted sum over the objectives. Such approaches may oversimplify the underlying problem, and produce suboptimal results. This extended abstract outlines the limitations of using a semi-blind iterative process to solve multi-objective decision making problems. Our extended paper [4], serves as a guide for the application of explicitly multi-objective methods to difficult problems. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
- Authors: Hayes, Conor , Bargiacchi, Eugenio , Källström, Johan , Macfarlane, Matthew , Reymond, Mathieu , Verstraeten, Timothy , Zintgraf, Luisa , Dazeley, Richard , Heintz, Fredrik , Howley, Enda , Irissappane, Athirai , Mannion, Patrick , Nowé, Ann , Ramos, Gabriel , Restelli, Marcello , Vamplew, Peter , Roijers, Diederik
- Date: 2023
- Type: Text , Conference paper
- Relation: 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023, London, 29 May to 2 June 2023, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS Vol. 2023-May, p. 1988-1990
- Full Text:
- Reviewed:
- Description: Real-world sequential decision-making tasks are usually complex, and require trade-offs between multiple - often conflicting - objectives. However, the majority of research in reinforcement learning (RL) and decision-theoretic planning assumes a single objective, or that multiple objectives can be handled via a predefined weighted sum over the objectives. Such approaches may oversimplify the underlying problem, and produce suboptimal results. This extended abstract outlines the limitations of using a semi-blind iterative process to solve multi-objective decision making problems. Our extended paper [4], serves as a guide for the application of explicitly multi-objective methods to difficult problems. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
Utility-based reinforcement learning : unifying single-objective and multi-objective reinforcement learning
- Vamplew, Peter, Foale, Cameron, Hayes, Conor, Mannion, Patrick, Howley, Enda, Dazeley, Richard, Johnson, Scott, Källström, Johan, Ramos, Gabriel, Rădulescu, Roxana, Röpke, Willem, Roijers, Diederik
- Authors: Vamplew, Peter , Foale, Cameron , Hayes, Conor , Mannion, Patrick , Howley, Enda , Dazeley, Richard , Johnson, Scott , Källström, Johan , Ramos, Gabriel , Rădulescu, Roxana , Röpke, Willem , Roijers, Diederik
- Date: 2024
- Type: Text , Conference paper
- Relation: 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024, Auckland, 6-10 May 2024, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS Vol. 2024-May, p. 2717-2721
- Full Text:
- Reviewed:
- Description: Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach. © 2024 International Foundation for Autonomous Agents and Multiagent Systems.
- Authors: Vamplew, Peter , Foale, Cameron , Hayes, Conor , Mannion, Patrick , Howley, Enda , Dazeley, Richard , Johnson, Scott , Källström, Johan , Ramos, Gabriel , Rădulescu, Roxana , Röpke, Willem , Roijers, Diederik
- Date: 2024
- Type: Text , Conference paper
- Relation: 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024, Auckland, 6-10 May 2024, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS Vol. 2024-May, p. 2717-2721
- Full Text:
- Reviewed:
- Description: Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach. © 2024 International Foundation for Autonomous Agents and Multiagent Systems.
- «
- ‹
- 1
- ›
- »