Voulez-vous effacer les recherches récentes ?

Toutes les recherches récentes seront supprimées

Dynamic resource allocation as an estimation problem (Aurélien Garivier-IMT/Université Paul Sabatier)

il y a 5 ans187 views

Dynamic resource allocation as an estimation problem
An agent sequentially chooses actions in a set of possible options. Each action leads to a stochastic reward, whose distribution is unknown. What dynamic allocation rule should he choose so as to maximize his cumulated reward?
The study of "bandit problems" (the word refers to the paradigmatic situation of a gambler facing a row of slot-machines and deciding which one to choose in order to maximize his rewards), dating back to the 1930s, was originally motivated by medical trials. In has recently raised a renewed interest because of computer-driven applications, from computer experiments to recommender systems and Big Data.
One possible solution, called UCB, relies on upper confidence bounds of the expected rewards associated to all actions. In this presentation, I shall explain why fine confidence bounds are required in order to obtain efficient allocation rules, and I shall present the statistical challenges involved in their construction.

Signaler cette vidéo

Quel est le problème ?

Intégrer la vidéo

Dynamic resource allocation as an estimation problem (Aurélien Garivier-IMT/Université Paul Sabatier)
Lecture auto
<iframe frameborder="0" width="480" height="270" src="//www.dailymotion.com/embed/video/xz1wpe" allowfullscreen allow="autoplay"></iframe>
Intégrer la vidéo à votre site avec le code d'intégration ci-dessus