We construct REINFORCE estimators based on multiple samples with and without replacement and obtain a baseline for free!
Scientist & engineer with a PhD in machine learning and a passion for (combinatorial) optimization.