Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions (part I, part II) Matthew MacKay, Paul Vicol, Jon Lorraine, David Duvenaud, Roger Grosse https://arxiv.org/abs/1903.03088 One approach to hyper-parameter choice is to apply gradient descent in the hyper-parameter space. For each setting of hyper-parameters, you run your optimization to convergence, get resulting loss, and then backprop through these steps to…