Yaroslav BulatovIntroduction to residual correctionBetter formated version (LaTex) of this post on Ghost7 min read·Oct 14, 2023----
Yaroslav BulatovGradient descent under harmonic eigenvalue decayBetter formatted version on ghost (due to Latex support)4 min read·Feb 16, 2023----
Yaroslav BulatovCritical batch-size and effective dimension in Ordinary Least SquaresNote: a better formatted version (due to lack of LaTeX support on medium) is here5 min read·Jan 30, 2023----
Yaroslav Bulatovoptimal learning rate for Gradient Descent on a high-dimensional quadraticA better formatted version of this article is on Ghost, which has proper Latex support…2 min read·Dec 27, 2021----
Yaroslav BulatovHow many matmuls are needed to compute Hessian-vector products?Suppose you have a simple composition of d dense functions. Computing Jacobian needs d matrix multiplications. What about computing Hessian…2 min read·Dec 15, 2021----
Yaroslav BulatovHow to do matrix derivativesSuppose you have the following scalar function of matrix variable W.3 min read·Jul 9, 2021--1--1
Yaroslav BulatovUsing “Evolved Notation” to derive the Hessian of cross-entropy lossI was recently reminded of a lesson learned at Stephen Boyd’s Convex Optimization class at Stanford a few years ago, back when Google was…3 min read·Aug 30, 2019----
Yaroslav BulatovICLR Optimization papers IIISelf-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions4 min read·Jun 25, 2019--1--1