Buy 4 REINFORCE Samples, Get a Baseline for Free!

Abstract

We construct REINFORCE estimators based on multiple samples with and without replacement and obtain a baseline for free!

Date
Location
New Orleans, USA
Avatar
Wouter Kool
Machine Learning & Optimization

Scientist & engineer with a PhD in machine learning and a passion for (combinatorial) optimization.