Online and Reinforcement Learning
By Aida Rahmattalabi
The 2019 INFORMS Annual Meeting has come to an end with a series of exciting sessions on Wednesday. In particular, the session “Online and Reinforcement Learning,” hosted five interesting talks, discussing the challenging machine learning problems.
If you are a machine learning researcher, you must have faced the following questions: “What learning model is more appropriate?” or “How should I set the hyper-parameters?” In her talk, Madeleine Udell proposed a methodology to automate this task. AutoML, as she named it, is particularly useful as it enables the widespread use of machine learning by non-experts. In the presentation, Udell shared experimental results which showed that their approach outperforms competing approaches on a test bed of supervised learning problems. The second talk was given by Ian Kash from the University of Illinois at Chicago, which concerned the problem of reinforcement learning and providing theoretical guarantees for a broader class of problems within this space.
Multi-armed bandit problems are characterized with a set of arms (decisions) whose utilities are uncertain. In her talk, Gauri Joshi introduced a novel extension of this problem where the utilities of different actions, while uncertain, are correlated via some hidden parameters. She proposed an approach that exploits this structure to obtain high-quality policies. The multi-armed bandit problem is also relevant in the context of causal inference, where pulling any arm (making any decision) corresponds to an intervention. In his talk, Elias Bareinboim described a procedure that uses partially specified causal knowledge and identifies the optimal decisions in the structural bandits. Finally, Theja Tulabandhula discussed a variant of the stochastic multi-armed bandits with impairment. Impairment effect is defined as the phenomena where a decision maker can only receive utility for an action if they have “played” it at least a few times in the recent past. Tulabandhula and his coauthors provide two efficient algorithms for different models of impairment.