Lirio’s AI Research team recently developed a novel adaptive stochastic gradient-free (ASGF) approach for solving some of the most difficult optimization challenges in machine learning. This innovative optimization algorithm, which is simple to implement and does not require careful fine-tuning, offers significant improvements when compared to existing state-of-the-art approaches. The technical details of this method are described in the arXiv submission, an open archive of scholarly articles relating to mathematics, computer science and related fields.
The primary objective behind the development of the new ASGF technique is to accelerate Lirio’s performance when solving real-world reinforcement learning (RL) tasks. RL is one of the most challenging areas of machine learning (ML) and artificial intelligence (AI) because it involves sequential decision-making in which each action taken by the agent (the computer program) affects the environment, thus altering the setting in which each subsequent decision is made. Deploying a learning agent in real-world environments is particularly challenging because most of these environments are noisy, non-stationary, and only partially observable.
A Brief Overview
For those without a strong intuitive understanding of ML, it’s useful to think of it as having two primary components: representation and optimization.
- Representation: Determining how to put information about a task into a form that the machine can use, so the problem is effectively represented to a computer.
- Optimization: Our best guess at a good decision policy (which defines the actions the agent takes after observing the environment), given our representation and the feedback our system has seen to date. In other words, based on what we’ve observed so far, what policy should our agent follow that would likely maximize the rewards (or minimize some loss) that our system would generate in the future?
Optimization is a core component of fitting a model to data in many mathematical/computational tasks, but it is particularly challenging in real-world RL. We have to deal with tremendous amounts of uncertainty, highly non-convex loss functions, restrictions in how we can sample the space, countless hidden confounding factors, and an ever-changing environment (that continues to change based on our agent’s actions in ways we try to learn to anticipate). Exploring possible actions to see their effects has consequences, and therefore we need to be able to take advantage of the data we already have to the greatest degree possible.
Real-World Application of Reinforcement Learning
Representation of the problem matters greatly, and in novel applications of RL, developing effective representations is a difficult task. The problem is further exacerbated when we struggle to make use of the data through effective machine learning optimization algorithms.
Most optimization algorithms used in RL have been developed with a focus on environments that can be reset, like video games or robotic simulations. When resetting your environment is not an option, as in our behavior change applications, you can’t learn a successful model by taking a complex representational form and exploring countless parameter settings on large graphics processing unit (GPU) clusters while resetting and trying again and again.
In other words, we can’t simply build a policy by exploring as much of our state space as necessary, following different decision paths from the same starting point to obtain a policy that comes close to generating a maximum reward. What happens when sampling our space is costly and has real, lasting consequences?
At Lirio, we need to make optimal decisions based on the best ML principles for ensuring generalization performance. This allows our agent to make choices that we have some confidence in, even when it encounters states it has never seen before. In this context, the optimization algorithms typically used in RL are not well-suited to handle the difficulties presented by the real-world RL tasks that we are addressing.
The Answer: ASGF
Our ASGF method tackles such challenges by combining a unique set of capabilities that result in:
- an extremely efficient training procedure
- massive scalability and sample-efficiency
- intelligent trade-offs between exploitation and exploration
- adaptive selection of all algorithm parameters
- accelerated convergence towards an optimal solution
We will be posting additional blog posts offering more insights into the ASGF algorithm and its significance in the near future. Stay tuned.
Other readers viewed:
Want to learn more about how Lirio’s behavioral engagement solution utilizes behavioral science and machine learning to help organizations motivate the people they serve to achieve better outcomes?