Online Learning Workshop

On the occasion of my PhD defense, I’m organizing a workshop on Online Learning in Télécom at 10.00am featuring two of the members of my PhD committee.

9.30am – 10am : Welcome coffee and croissants

10am to 11am : Csaba Szepesvàri (University of Alberta & Google DeepMind)

Adaptive (Non-)Convex Optimization: Optimism,

Composite Objectives, and Variational Bounds

11am to 12am : Wouter M. Koolen (CWI Amsterdam)

The Design of Adaptive Online Learning Methods

Registration is free, anyone is welcome to attend. Just come to Télécom and show an ID at the reception desk.

Address : 46 rue Barrault 75013 (

Soutenance — PhD defense

Title: « Statistical Models of User Behavior Under Bandit Feedback »

I will defend my PhD on…

October, 20th at 3.00pm

… in B310 at Télécom ParisTech (46 rue Barrault, 75013 PARIS). It is a public event, you are all welcome to attend the defense and share a drink with us afterwards.


Meetup talks in January-February

I have been invited to give two talks at Parisian Meetups on Machine Learning :

  • January, 30th : Afterwork MVA (ML masters alumni meetup) . I’ll be talking about Applications of Bandits Algorithms in Online Advertising. I will start by a short introduction to Reinforcement Learning in general and then focus on recent advances in the bandit literature (see below).
  • February, 1st : RecSysFR Meetup . I’ll be talking about « Sequential Learning in the Position-Based Model ». This presentation will be a bit similar to the previous one, with more attention given to the Position-Based model which is a well know click model in Online Advertising. I’ll try to show how important it is to have a clear understanding of the feedback model underlying the sequential learning model that is considered

The talks will be based on my NIPS paper (with P.Lagrée and O.Cappé) as well as on my recently accepted AISTATS 2017 paper (with S.Katariya, B.Kveton, C.Szepesvari, Z.Weng).

The slides are available here : PBM

SMILE Seminar : ICML debrieffing

On October, 15th, I presented the paper « Optimal Regret Analysis on Thompson Sampling for the Multi-Armed Bandit with Multiple Plays » by Komiyama, Honda and Nakagawa. This was a black board presentation but you can find my notes here.

Basically, this paper studies the performance of Thompson Sampling when used in a multiple plays context : the learner is allowed to pull L<K arms at each round and to observe the L rewards independently. The authors show the asymptotic optimality of their algorithm.

Introduction au Machine Learning

Je publie aujourd’hui un cours en français rédigé pour le master spécialisé Big Data de Télécom Paristech.


Ce cours contient une brève introduction définissant les principales notions utiles en apprentissage statistique puis développe des algorithmes de base pour l’apprentissage supervisé et non-supervisé.

Cours donné le 8 janvier 2015.