- Title
- Scalar reward is not enough : a response to Silver, Singh, Precup and Sutton (2021)
- Creator
- Vamplew, Peter; Smith, Benjamin; Källström, Johan; Ramos, Gabriel; Rădulescu, Roxana; Roijers, Diederik; Hayes, Conor; Heintz, Fredrik; Mannion, Patrick; Libin, Pieter; Dazeley, Richard; Foale, Cameron
- Date
- 2022
- Type
- Text; Journal article
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/186897
- Identifier
- vital:16985
- Identifier
-
https://doi.org/10.1007/s10458-022-09575-5
- Identifier
- ISBN:1387-2532 (ISSN)
- Abstract
- The recent paper “Reward is Enough” by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, this type of reward is insufficient for the development of human-aligned artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour. © 2022, The Author(s).
- Publisher
- Springer
- Relation
- Autonomous Agents and Multi-Agent Systems Vol. 36, no. 2 (2022), p.
- Rights
- All metadata describing materials held in, or linked to, the repository is freely available under a CC0 licence
- Rights
- http://creativecommons.org/licenses/by/4.0/
- Rights
- Copyright © 2022, The Author(s)
- Rights
- Open Access
- Subject
- 4602 Artificial intelligence; Artificial general intelligence; Multi-objective decision making; Multi-objective reinforcement learning; Reinforcement learning; Safe and ethical AI; Scalar rewards; Vector rewards
- Full Text
- Reviewed
- Funder
- This research was supported by funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program, and by the National Cancer Institute of the U.S. National Institutes of Health under Award Number 1R01CA240452-01A1. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or of other funders. Pieter J.K. Libin acknowledges support from the Research Foundation Flanders (FWO, fwo.be) (postdoctoral fellowship 1242021N). Johan Källström and Fredrik Heintz were partially supported by the Swedish Governmental Agency for Innovation Systems (Grant NFFP7/2017-04885), and the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. Conor F. Hayes is funded by the National University of Ireland Galway Hardiman Scholarship. Gabriel Ramos was partially supported by FAPERGS (Grant 19/2551-0001277-2) and FAPESP (Grant 2020/05165-1). Open Access funding enabled and organized by CAUL and its Member Institutions
- Hits: 10504
- Visitors: 8814
- Downloads: 91
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | SOURCE1 | Published version | 684 KB | Adobe Acrobat PDF | View Details Download |