Environment design. (a) The two-dimensional gridworld environment used in Experiment 1. (b) To study the properties of the optimal reward, we made several modifications to the gridworld environment. Top row: In the one-time learning environment, the agent could chose to stay in the food location constantly after reaching it. In the lifetime learning environment, the agent was teleported to a random location in the gridworld as soon as it reached the food state. Middle row: In the stationary environment, the food remained in the same location throughout the agent’s lifetime. In the non-stationary environment, the food changed its location during the agent’s lifetime. Bottom row: We used a gridworld of size 7 × 7 to simulate a dense reward setting. To simulate a sparse reward setting, we increased the size of the gridworld to 13 × 13. Credit: PLOS Computational Biology (2022). DOI: 10.1371/journal.pcbi.1010316
A trio of researchers, two with Princeton University, the other the Max Planck Institute for Biological Cybernetics, has developed a reinforcement learning-based simulation that shows the human desire always to want more may have evolved as a way to speed up learning. In their paper posted in the open access PLOS Computational BiologyRachit Dubey, Thomas Griffiths and Peter Dayan describe the factors that went into their simulations.
Researchers studying human behavior have often been puzzled by people’s seemingly contradictory desires. Many people have an unceasing desire for more of certain things, even though they know that meeting those desires may not result in the desired outcome. Many people want more and more money, for example, with the idea that more money would make life easier, which should make them happier. But a host of studies has shown that making more money rarely makes people happier (with the exception of those starting from a very low income level). In this new effort, the researchers sought to better understand why people would have evolved this way. To that end, they built a simulation to mimic the way humans respond emotionally to stimuli, such as achieving goals. And to better understand why people might feel the way they do, they added checkpoints that could be used as a happiness barometer.
The simulation was based on reinforcement learning, in which people (or a machine) continue doing things that offer a positive reward and cease doing things that offer no reward or a negative reward. The researchers also added simulated emotional reactions to the known negative impacts of habituation and comparison, whereby people become less happy over time as they get used to something new and become less happy when seeing that someone else has more of something they want.
In running the simulation, the researchers found that it achieved goals faster when habituation and comparison came into play—a suggestion that such emotional reactions might also play a role in faster learning in humans. They also found that the simulation wound up less “happy” when faced with more choices regarding possible achievable options than when there were just a few to choose from.
The researchers suggest that the reason people are prone to being trapped in an endless cycle of always wanting more is because overall, it helps humans to learn faster.
Happiness: Why learning, not rewards, may be the key
Rachit Dubey et al, The pursuit of happiness: A reinforcement learning perspective on habituation and comparisons, PLOS Computational Biology (2022). DOI: 10.1371/journal.pcbi.1010316
© 2022 Science X Network
citations: Reinforcement learning–based simulations show human desire to always want more may speed up learning (2022, August 5) retrieved 6 August 2022 from https://phys.org/news/2022-08-learningbased-simulations-human-desire. html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.