Ben Grimmer @prof_grimmer Twitter profile

Last Seen Profiles

@OHSUCardFellows

@TrangNgeiyn

@isabellalee_84

@yasserptc

@PasajesLibreria

@middletowngac

@ViralIndia69

@owKSAA

@5609e07G884OV45

@Poifetso349

@GOCHIHAM

@mimi11324

@pip2017pip

@BinorRaja

@pasurtihot

@MadalRichards

@Rubycxfe

@jandakembangstw

@BinorRaja

@Pinedagol

@Stiidzinho

@DVillage_Hauz

@stwmaniax

@uiuxbelawu

@BreadblackNSFW

@CathyBrammer

@niloofar6459

@stwmaniax

@middletowngac

@5609e07G884OV45

@coli_jilbab

@stw_pdg

@BDCards90

@KekeLou2

@bokeplokalmalam

Ben Grimmer

@prof_grimmer

1 year

I've proven the strangest result of my career.. The classic idea that gradient descent's rate is best with constant stepsizes 1/L is wrong. The idea that we need stepsizes in (0,2/L) for convergence is wrong. Periodic long steps are better, provably.

71

604

4K

Ben Grimmer

@prof_grimmer

1 year

The new strangest results of my career (with Kevin Shu and Alex Wang). Gradient descent can accelerate (in big-O!) by just periodically taking longer steps. No momentum needed to beat O(1/T) in smooth convex opt! Paper: [1/3]

11

93

708

Ben Grimmer

@prof_grimmer

2 years

Perfect shelf for my collection of induced matrix norm unit balls. The individual stand colors denote whether the norm can be computed in polynomial time, is NP-hard, or is an open question. Details:

4

38

322

Ben Grimmer

@prof_grimmer

1 year

We can do better than this state-of-the-art constant stepsize guarantee with a tiny adjustment, just alternate between a long step (longer than we can guarantee descent for) and a short step. Just alternating 2.9, 1.5, 2.9, 1.5, ... is provably 10% faster than prior best known!!

3

13

255

Ben Grimmer

@prof_grimmer

1 year

A nice mathematical puzzle🧐 If you take a 4-norm ball and cut it carefully, you will find a two-norm ball. 3D printed visual evidence below. The puzzle: Why does this happen and how much more generally does it happen? (This question was first posed to me by Pablo Parrilo)

9

23

204

Ben Grimmer

@prof_grimmer

1 year

I believe every optimization method should be understood both in primal and dual terms. I was annoyed when teaching basic subgradient methods that I did not have a clear dual story to tell my students. Cue a paper giving this missing equivalent dual story

Some Primal-Dual Theory for Subgradient Methods for Strongly...

We consider (stochastic) subgradient methods for strongly convex but potentially nonsmooth non-Lipschitz optimization. We provide new equivalent dual descriptions (in the style of dual averaging)...

arxiv.org

2

38

203

Ben Grimmer

@prof_grimmer

1 year

This effect even manifests in actual numerics. In some sampled least squares problems, we see these faster rates from our theory occur perfectly Thanks for reading! I'm still trying to get my head fully around this long-step acceleration, so I hope I've written clear enough here

5

8

204

Ben Grimmer

@prof_grimmer

2 years

My 3D printer has been working overtime this summer printing every norm ball I can think of. At the moment I have 70 interesting norm balls. Below are perhaps the most common norms: p-norm balls with dual pairs (1,inf), (4/3,4), (3/2,3), (2,2) Details:

2

29

186

Ben Grimmer

@prof_grimmer

1 year

Cycling through a pattern long than length two gives increasingly big payoffs! ! A cycle of length 127, periodically taking a long step of length 370, gets a rate almost 3x faster than prior best!! The proofs use neat computer-assisted semidefinite programming; see the paper :)

5

8

191

Ben Grimmer

@prof_grimmer

1 year

Gradient descent go Brrr @QuantaMagazine

Risky Giant Steps Can Solve Optimization Problems Faster | Quanta Magazine

New results break with decades of conventional wisdom for the gradient descent algorithm.

www.quantamagazine.org

4

21

107

Ben Grimmer

@prof_grimmer

2 years

In minimax optimization, first-order methods can locally converge to limit points, cycle, or (if very lucky) globally converge (plots below shows these for the proximal point method). Some of my work on these behaviors just appeared in Math Programming ().

1

14

84

Ben Grimmer

@prof_grimmer

1 year

Since tweeting about faster gradient descent via long steps, I learned of an exceptional prior work needing credit: Jason Altschuler's Master's thesis (advised by Pablo Parrilo) broke ground here 5 years ago! ! For smooth strongly convex optimization, he gave optimal faster

1

8

82

Ben Grimmer

@prof_grimmer

2 years

How to (optimally) design a fair/robust syllabus for optimization classes: Below is the polyhedron of all rubrics I find reasonable, which I then maximize for each student. As a result, evaluation focuses on their individual areas of excellence. Details:

6

8

70

Ben Grimmer

@prof_grimmer

2 years

In preparation for teaching the simplex method, I needed a Klee-Minty Cube showing exponentially many pivots all improving the LP objective value. The path shown below is entirely increasing in the direction c=(4,10,25) and visits all 2^3 corners.

3

5

62

Ben Grimmer

@prof_grimmer

1 year

I had the privilege to talk about my recent work improving gradient descent's theory at MIT last week. If you'd like to see my first go at presenting this work, Thank you @shuvo_das_gupta for organizing (and for your numerical work that motivated mine)!

Provably Faster Gradient Descent via Long Steps by Ben Grimmer

This is the video for the talk on faster gradient descent via long steps by Professor Ben Grimmer that took place virtually at the MIT Operations Research Ce...

www.youtube.com

Shuvomoy Das Gupta

@shuvo_das_gupta

1 year

Prof. Ben Grimmer ( @prof_grimmer ) gave a virtual talk at MIT Operations Research Center last Friday about his recent work on faster gradient descent via long steps, which was featured in @QuantaMagazine . The YouTube link to his talk is here: !

0

8

53

1

5

51

Ben Grimmer

@prof_grimmer

2 years

Recently discovered a record of my earliest endeavors in optimization, circa 2003

4

0

48

Ben Grimmer

@prof_grimmer

2 years

By far my biggest 3D printing project: 49 operator norm balls! Below are induced l_p -> l_q matrix norm balls with 7 values of p, q. The base color indicates whether computing the norm is polynomial (green), NP-Hard (red/orange) or open (white). Details:

1

5

45

Ben Grimmer

@prof_grimmer

2 years

My adviser Jim Renegar showed in me the amazing unifying power of viewing optimization as convex cones: LPs as Nonnegative Orthant, Second-Order & Semidefinite Prog namesakes, and NP-Hardness as Copositives Below are the models I use to teach this Details

1

5

45

Ben Grimmer

@prof_grimmer

1 year

Doing something new today, an art piece (my first) is on exhibition at the at the Bridges conference It's a shelf with 49 induced matrix norm balls, showing their difficult computational complexities Details: Full gallery:

0

3

45

Ben Grimmer

@prof_grimmer

1 year

Recent methods for nonconvex, nonsmooth Lipschitz minimization attain "Goldstein" stationarity: an average of nearby gradients is small In a short(!) paper @jiazhichao4 and I extend this to Lipschitz constraints: averaging nearby gradients satisfies KKT!

0

4

43

Ben Grimmer

@prof_grimmer

1 year

@octonion I had my computer brute force search through a lot of possible patterns h. Most failed. May just be my proof technique being brittle but the patterns I give in the paper would fail to have a descent lemma if you permute the order at all. They seem very special.

1

40

Ben Grimmer

@prof_grimmer

2 years

The journal version of one of my results PhD just appeared! If you prove a guarantee for your minimization algorithm under strong convexity, a guarantee under just convexity (often) follows right away (no changing or perturbing the method needed). 🧵1/2

General Hölder Smooth Convergence Rates Follow from Specialized Rates Assuming Growth Bounds

Journal of Optimization Theory and Applications - Often in the analysis of first-order methods for both smooth and nonsmooth optimization, assuming the existence of a growth/error bound or KL...

link.springer.com

2

36

Ben Grimmer

@prof_grimmer

1 year

@loiseau_jc I'm no expert there, but I think Yes. The main previous result about long step gradient descent was from Young in 1953, who showed taking stepsizes based on the Chebyshev polynomial roots optimally accelerated convergence on quadratics. His argument was exactly about this

1

0

35

Ben Grimmer

@prof_grimmer

2 years

Just finished printing my students' final project in my "Freshman Experience in Applied Math" course! They wrapped a hexagonal grid's shadow perfectly onto a box @JohnsHopkinsAMS @HopkinsEngineer @JohnsHopkins

2

5

32

Ben Grimmer

@prof_grimmer

1 year

I'll be talking this afternoon about Probably Faster Gradient Descent via Long Steps @MOPTA_ORMS . (If you aren't here or can't make it, a previous recording is online at )

Provably Faster Gradient Descent via Long Steps by Ben Grimmer

This is the video for the talk on faster gradient descent via long steps by Professor Ben Grimmer that took place virtually at the MIT Operations Research Ce...

www.youtube.com

1

8

33

Ben Grimmer

@prof_grimmer

2 years

How hard is optimizing over a smooth and/or strongly convex set? If you only know how to check set feasibility and compute normal vectors, you get the same optimal rates as minimizing strongly convex/smooth functions My work on this just appeared on arvix

Gauges and Accelerated Optimization over Smooth and/or Strongly Convex Sets

We consider feasibility and constrained optimization problems defined over smooth and/or strongly convex sets. These notions mirror their popular function counterparts but are much less explored...

arxiv.org

1

4

31

Ben Grimmer

@prof_grimmer

11 months

DeepMath2023 is going on right now at Hopkins! Live stream for the rest of the world below:

DeepMath 2023 Day 1 Live Stream

null

www.youtube.com

0

6

31

Ben Grimmer

@prof_grimmer

2 years

Often seeking sparse solutions to optimization problems leads to adding nonconvex regularization. The family photo below gives some intuition for this effect from Smoothly Clipped Absolute Deviation (SCAD) regularization constraint sets {x | s(x)<p} for varied p

3

2

30

Ben Grimmer

@prof_grimmer

2 years

A nontrivial way to build norms in d dimensions: given any vector, compute a k<d dimension vector out of its top k magnitude entries and then use any k dimensional norm. Below are the 3d norms given by all the 2d p-norms this way. (As p->infty, we just get the infty norm ball)

1

28

Ben Grimmer

@prof_grimmer

1 year

A while back, I 3D printed some convex cones as teaching props ( ) I'm looking for more fun ones to add to my collection, seeking recommendations :) My current "todo" list: completely positive, exponential, and sum-of-squares cones (and their duals)

1

0

30

Ben Grimmer

@prof_grimmer

2 years

Just posted my newest work: How hard is it to minimize a sum of varied smooth and nonsmooth terms? Surprisingly, the optimal rate is the sum of each term's individual settings rate!! Moreover, it is attained universally by an algorithm of Nesterov.

On Optimal Universal First-Order Methods for Minimizing Heterogeneous Sums

This work considers minimizing a sum of convex functions, each with potentially different structure ranging from nonsmooth to smooth, Lipschitz to non-Lipschitz. Nesterov's universal fast gradient...

arxiv.org

1

3

29

Ben Grimmer

@prof_grimmer

1 year

Previous works had it go brrrrrr but I proved it's better to go BrrRrrRRRrrrrRrrr

1

2

28

Ben Grimmer

@prof_grimmer

1 year

This area rapidly evolving! Jason Altschuler and Pablo Parrilo posted long-step acceleration results for strongly convex problems just last Friday! We are cracking this problem one small (or long) step at a time :) [3/3]

2

1

28

Ben Grimmer

@prof_grimmer

2 years

If you thought p-norm balls were cool, wait til you see their matrix version... Below are unit Schatten p-norm balls over 2x2 symmetric matrices. Common norms: p=2 is Frobenius, p=inf is Operator Norm, p=1 is Nuclear Norm capturing low-rankness Details:

2

26

Ben Grimmer

@prof_grimmer

2 years

Just finished printing & distributing several Klee-Minty Cubes throughout my department for optimization classes (and fun) :) If you'd like one, most university libraries can print them with my files below:

Ben Grimmer

@prof_grimmer

2 years

In preparation for teaching the simplex method, I needed a Klee-Minty Cube showing exponentially many pivots all improving the LP objective value. The path shown below is entirely increasing in the direction c=(4,10,25) and visits all 2^3 corners.

3

5

62

1

3

26

Ben Grimmer

@prof_grimmer

1 year

On my way to give a seminar tomorrow at @TippieAnalytics . Will be talking about new projection-free "Radial" methods. Example constraint set props are packed and ready to go :) (Plenty of polyhedrons, spectrahedrons, norm balls, and sparsity inducing regularizers)

1

0

23

Ben Grimmer

@prof_grimmer

1 year

The key is stepsizes MUCH larger than 2/L, the classic limit! Ours are built up from exponentially long patterns with exponentially large average stepsizes and a neat fractal shape. (See the paper for their proper construction.) [2/3]

3

1

24

Ben Grimmer

@prof_grimmer

1 year

A two part series of papers I wrote towards the end of my PhD just appeared in Math Programming :) My first ideas towards these papers were in Spring 2017, so they've been >6years in the making. Part I: Part II:

2

24

Ben Grimmer

@prof_grimmer

2 years

Math Programming just announced Meritorious Service Awards for "timely, judicious, and considerate reviews". I wanted to thank the many judicious and considerate referees I have had. Some day I hope to have a timely one :) (disclosure I am on their list)

Mathematical Programming

link.springer.com

0

1

24

Ben Grimmer

@prof_grimmer

2 years

Packed all the important things for my talk @INFORMS2022 tomorrow :)

1

22

Ben Grimmer

@prof_grimmer

1 year

@konstmish Unreasonable is the perfect word. I needed a computer assisted proof technique since it went beyond my ability to reason xD

1

0

19

Ben Grimmer

@prof_grimmer

2 years

Have you ever wondered what the Max-Cut Polytope, its semidefinite outer approx, and its trigonometric inner approx look like? For 3x3 matrices, it lives in 3D and is shown below. For 4x4 matrices, it lives in 6D and some slices are shown below. Details:

2

20

Ben Grimmer

@prof_grimmer

2 years

Musing on Dual Norms: the most classic pair (IMHO) is L_1 and L_inf. 2D does not do duality justice, as they look like rotations of each other. Taking a dual interchanges the roles of points and faces. Below L_1 has 8 faces and 6 points aligned with L_inf's 8 points and 6 faces.

1

0

20

Ben Grimmer

@prof_grimmer

2 years

As further reference objects littering my office... I've 3D printed several common nonconvex regularizers: q-norm (q<1) and SCAD regularization induce sparsity, Schatten q-norm regularization inducing low-rankness. Documented at:

0

2

16

Ben Grimmer

@prof_grimmer

2 years

Are you tired of your board games being stored unevenly due to their varied shapes? Just build yourself an oppositely uneven shelf to get nice level tops when stored in exactly the right order...

1

16

Ben Grimmer

@prof_grimmer

1 year

@WhoAreYouMate I have mixed intuitions (but that means its a really good research question). Once in the neighborhood of a local min, I'd expect this speeds up convergence. Global bounds less clear. But... accelerated momentum methods don't generalize well, so there is also precedent to fail.

0

17

Ben Grimmer

@prof_grimmer

2 years

Saturday's Project #lego #botanicalart

1

0

15

Ben Grimmer

@prof_grimmer

1 year

Deepmath2023 will be at JHU! Please consider submitting an abstract (deadline extended to 6/15)! . We encourage submissions from diverse disciplines including Statistics, Physics, Computer science, Neuroscience, Mathematics, Psychology, and Engineering.

2

4

15

Ben Grimmer

@prof_grimmer

11 months

Printing a new family of norm balls: Function Norms! These ones are strange; first one just finished! Below is p=4-function unit ball ||f||_4^4=\int_{-1}^1 |f(x)|^4dx<=1 (Okay, functions give too big of a space so I am just looking at quadratics in the Legendre polynomial basis)

1

0

15

Ben Grimmer

@prof_grimmer

1 year

@chanwoopark20 I don't have a good intuition, my main source of inspiration was the BnB-PEP paper which also had these outlier long steps. @ErnestRyu , @shuvo_das_gupta 's paper was a huge inspiration for my work. I love it. My theory results couldn't match their numerics, though. The rates I

1

0

14

Ben Grimmer

@prof_grimmer

1 year

@roydanroy @bremen79 Just doing deterministic schedules repeating the same pattern. I think the proof would naturally extend to backtracking ideas of adaptivity (check if sufficient descent came from doing the pattern, else cut the estimated smoothnes constant in half). Other adaptive ideas would

0

14

Ben Grimmer

@prof_grimmer

2 years

The 2norm ball (brought from 2d up into 3d) has a delightful property: It rolls!!

2

1

11

Ben Grimmer

@prof_grimmer

1 year

Bumping this old tweet since its timely... Consider grading your next optimization class via linear programming :) It can be more flexible and forgiving than a fixed rubric. Over two years, I've yet to have a student complain (they like having their score maximized)

Ben Grimmer

@prof_grimmer

2 years

How to (optimally) design a fair/robust syllabus for optimization classes: Below is the polyhedron of all rubrics I find reasonable, which I then maximize for each student. As a result, evaluation focuses on their individual areas of excellence. Details:

6

8

70

2

0

10

Ben Grimmer

@prof_grimmer

2 years

If you are in Baltimore or Vancouver, this week I am giving seminars at Johns Hopkins (Thursday) and UBC (Friday) on my recent work developing new scalable projection-free optimization methods. (If you aren't and still want to listen in, DM me and I can share zoom links.)

1

0

10

Ben Grimmer

@prof_grimmer

2 years

In preparation for the Fall, printing a full sized 9"x9" grading polyhedron. Hopefully it's big enough to be seen from the back of the room

0

10

Ben Grimmer

@prof_grimmer

1 year

To give a good primal-dual story, we didn't just want primal and dual algorithm descriptions, but also primal and dual convergence rates. We can measure those as the iterates converging down to optimal and the model converging up to optimal.

1

0

10

Ben Grimmer

@prof_grimmer

2 years

I'll be giving a talk on "Radial Duality: Scalable Projection-Free Optimization" at @iccopt2022 this afternoon. Rauch 201 at 3:50pm #iccopt2022 #optimization PS: Some 3D printed references will be used as figures.

0

8

Ben Grimmer

@prof_grimmer

1 year

@j_bertolotti @adad8m That's delightful!! You might want to select p values nonuniformly to capture the duality equally. For each p>2, you could include 1/(1-1/p) as it's dual norm in the time with p<2.

0

10

Ben Grimmer

@prof_grimmer

1 year

The one line description: Constrained optimization problems have another dual, given by an unconstrained gauge optimization problem. Based on this "radial duality", we get radial optimization algorithms that avoid expensive projections and may outscale IPM and ADMM-style methods

0

1

8

Ben Grimmer

@prof_grimmer

2 years

Just wrapped up a great semester's "Freshman Experience in Applied Math" course with some fantastic Hopkins students, below showing off their creation. Their writeup: A 3D model viewer and their design files:

0

1

10

Ben Grimmer

@prof_grimmer

1 year

And the grand finale, the convergence theory is (mostly) symmetric as it should be! Both primal and dual convergence for the subgradient method happens at the same rate!!!! (See the paper for the more general nonLipschitz, stochastic, proximal, functional constrained results).

0

1

9

Ben Grimmer

@prof_grimmer

1 year

Dyck paths are non-decreasing lattice walks from (1,1) to (k,k), a combinatorial classic. Andrew Daw @USCMarshall recently told me about a nice polytope they index in one of his papers () He'll get the 3D print below at @INFORMS2023 :)

0

2

8

Ben Grimmer

@prof_grimmer

2 years

Just printed another norm ball for my collection. Guess a formula for what norm generates this shape Hint(?): this is called a cuboctahedron, whose dual/polar is a rhombic dodecahedron.

3

0

8

Ben Grimmer

@prof_grimmer

10 months

Latest Hopkins Engineering Magazine has an article on visualizations of difficult scientific concepts. It's a nice read; a few of my 3D printed operator norm balls appear at the end

Hidden Realms - JHU Engineering Magazine

They say that beauty lies in the eye of the beholder. But sometimes it’s too small, too complex, or even too fast for the naked eye to comprehend. Using advanced technology, our engineers are making...

engineering.jhu.edu

0

10

Ben Grimmer

@prof_grimmer

2 years

When managing risk one often quantifies how bad the worst events are. Conditional Value at Risk (CVaR) measures the expected loss in the worst q% of events. This can be viewed as the 1-norm of q% largest magnitude entries. This is a norm!! CVaR Unit Ball is below in 3D with q=66%

1

0

7

Ben Grimmer

@prof_grimmer

1 year

@amitra_90 @konstmish Momentum methods still give better rates given perfect first-order oracles (in the regular convex optimization world at least). Noisy/inexact/stochastic settings are exactly where gradient descent can pull ahead of Nesterov's accelerated method, so you are exactly pointing to a

1

0

7

Ben Grimmer

@prof_grimmer

1 year

@SAnsumali I don't know what the best pattern is, but I was able to prove patterns of increasing length (up to t=127) get increasing gains. The alternating example is just showing gains in the simplest case of cycling through t=2 sizes. The best one I found (surely not universally best) is

2

0

7

Ben Grimmer

@prof_grimmer

2 years

@GregTrayling @ProfKinyon I've got baby version that prints in a few hours (all that plastic in the big shelf took ~50 days)

0

7

Ben Grimmer

@prof_grimmer

2 years

The dual norms of these look even more strange...

1

0

6

Ben Grimmer

@prof_grimmer

1 year

@RahulAPanicker That's a good path! There are very nice ways to write 2-norms and 4-norms as SDPs...

1

0

6

Ben Grimmer

@prof_grimmer

1 year

@aryehazan It works exactly for 4

0

6

Ben Grimmer

@prof_grimmer

1 year

@GuilleAngeris I would love one! I conjecture in the paper that cycle of pattern of length t should be able to at least speed things up by log(t), based on my computer certificates up to solving 129x129 SDPs Any path I see to prove that would likely need to do away with the computer assistance

1

0

6

Ben Grimmer

@prof_grimmer

1 year

@McFunkypants @QuantaMagazine @parshallison It is fantastic! Much much more pretty than the plot from my paper (but still evoking a connection, for those off the deep end, reading the study, with the same smooth curve over stairstep curve)

0

5

Ben Grimmer

@prof_grimmer

1 year

@XiongZeng111 Absolutely! Cutting perpedicular to any of the standard basis directions (including horizontally) gives a 4-norm one dimension lower. So the cut of a 4-norm that generates 2-norms must be different...

0

6

Ben Grimmer

@prof_grimmer

2 years

@thserra It makes for a great lecture one, combining syllabus review with a natural example of optimization (that the students deeply care about). It's also small enough to explicitly do the dual. My example grading in that writeup sows the seeds for attacking the dual grading program

1

0

5

Ben Grimmer

@prof_grimmer

2 years

How much longer does it take to 3D print a model after doubling it's size in each dimension? (hint: printers infill interiors, as shown in the cube print below, so the whole volume need not be printed)

2

0

5

Ben Grimmer

@prof_grimmer

1 year

@colin_fraser An intuition is hard, I needed a computer-aided proof technique, so there is some mystery resulting. If pressed to give a gut-feeling, one is that the "bad case" for a long step is if the objective looked like "U" or check mark shape where we overshoot. If that was the case, we

1

0

5

Ben Grimmer

@prof_grimmer

2 years

Designing new algorithms minimizing problems reformulated in terms of gauges squared, we get some promising numerical results compared to standard solvers! (Admittedly, these preliminary plots are for somewhat simple synthetic problems, realistic validation is still needed.)

0

5

Ben Grimmer

@prof_grimmer

1 year

The subgradient method is a classic, here we are thinking strongly convex minimization. Repeatedly linearized (below) your objective function and then move downhill. (The paper above also allows for proximal terms and functional constraints via switching, but lets keep it simple)

1

0

5

Ben Grimmer

@prof_grimmer

2 years

@ben_golub Wlog x is the origin. Then the boundary point you are looking at is b(v) = v/MinkowskiGauge(v). The Minkowski gauge is 1/R-Lipchitz continuous where R is the distance to the sets bdry. Your mapping b will inherit this except near the origin.

0

5

Ben Grimmer

@prof_grimmer

2 years

@RicardoJGMateus Mathematica Just put the source on my GitHub in case others want to play with it: (Manually added the arrows showing the pivots)

GitHub - profgrimmer/Klee-Minty: A Mathematica file generating the Klee-Minty Cube that shows the...

A Mathematica file generating the Klee-Minty Cube that shows the Simplex Method is exponentially slow.https://www.ams.jhu.edu/~grimmer/Klee.pdf - profgrimmer/Klee-Minty

github.com

0

4

Ben Grimmer

@prof_grimmer

1 year

The dual perspective on this follows from Dual Averaging (Nesterov 2005), below specialized and extended for strongly convex settings. At each iteration take a weighted average of our lower bounds to model the objective, minimize the model (plus regularization).

1

0

4

Ben Grimmer

@prof_grimmer

1 year

@ErnestRyu @shuvo_das_gupta If I may advertise your result :P , your solver is amazing and the software under the hood is fantastically useful and well documented! It's so good I put it in the acknowledgement at the end of my paper! ! Anyone here wanting to dive into PEP proofs should read yours!

0

4

Ben Grimmer

@prof_grimmer

1 year

@sjaak_blom Just gradient descent, x_{k+1}=x_k - (h_k/L)\nabla f(x_k), like Cauchy has done for 150+ years. No modern bells or whistles like those stochastic variants, those would be interesting to see

1

0

3

Ben Grimmer

@prof_grimmer

2 years

@damekdavis I'm thinking it is arbitrarily slow (under one interpretation). Pick any sequence of pos integers {a_i}. Let x_0=1 with x_{k+1}=x_k/2 whenever k\in {a_i}, else keep it constant x_{k+1}=x_k. Picking a_i growing arbitrarily fast makes |x_i-x_{i+1}|'s convergence arbitrarily slow.

0

4

Ben Grimmer

@prof_grimmer

1 year

@netw0rkf10w @konstmish There are several papers going back to Young in 1953 that get long step speed ups for quadratic minimization. My writeup is missing this most recent one, so the next arxiv post will add it in :) . Thank you!! The proof tools from that line don't seem to generalize to general

0

3

Ben Grimmer

@prof_grimmer

2 years

It's not eight times, scaling with volume, as most of the interior is not printed (that is, the n^3 coefficient is small) It's primarily four times, scaling with surface area, as the thick boundary dominates the print time If you are printing fractals, the answer is more fun...

1

0

4

Ben Grimmer

@prof_grimmer

1 year

@sp_monte_carlo @typedfemale Its a lot like Chebyshev methods, I expcet similar brittleness. I really wanted to use the spectral perspective analysis from that line of work, but since smooth convex functiosn can have changing Hessians, I couldn't find a footing. Ended having a SDP do the hard spectral work.

1

0

4

Ben Grimmer

@prof_grimmer

2 years

The results work for nonsmooth, smooth or Holder smooth problems and under any Holder growth (generalizing strong convexity); Using them on bundle methods and dual averaging where generalizing existing proofs get tricky gave new guarantees without substantial pain :) 🧵2/2

0

4

Ben Grimmer

@prof_grimmer

2 years

Hello World! Made this account to share pretty maths I have been/am 3D printing. Will have some pretty objects to share soon!

0

4

Ben Grimmer

@prof_grimmer

2 years

Pop Quiz: Name that Spectrahedron!

1

0

4

Ben Grimmer

@prof_grimmer

1 year

@adad8m you don't even need c there :)

1

0

4

Ben Grimmer

@prof_grimmer

1 year

@realShengchao @gabrielpeyre The best theorem to my knowledge is the LD^2/2T rate nound for constant size one. Of course that theorem doesn't prove it's actually the best. Taylor et al conjectured in 2017 what the optimal choice between one and two would be if T is given ahead of time. I don't know of a

0

2

Ben Grimmer

@prof_grimmer

2 years

@timoberthold Please steal freely (I'll also make/post a typo free copy this afternoon)! Eventually I'll have some 3D printed and will share the source so any can get a physical copy (from their university library)

0

3

Ben Grimmer

@prof_grimmer

1 year

@xray_matt I mean the second after my conjecture, but I agree it's ambiguous as written. Parentheses will appear in the next version :)

0

2

Ben Grimmer

@prof_grimmer

1 year

@damekdavis This looks very cool! Long steps are in fashion!

1

0

3

Ben Grimmer

@prof_grimmer

1 year

@gabrielpeyre If you'd like to 3D print a set of these

0

3

Ben Grimmer

@prof_grimmer

11 months

@bremen79 @damekdavis The method takes as input the target accuracy eps, you may call that a parameter. It also has a constant under the hood that is adapting from continual linesearching, you may call that a parameter. I'm not inclined to call either of those parameters, so my mental lexicon says Yes

1

0

3

Ben Grimmer

@prof_grimmer

2 years

The gauge of a set is the function whose level sets are scaled copies of the set. Evaluating a set's gauge and its gradient is often cheap (comparable to checking membership and computing a normal vector). We show the gauge squared is smooth/strongly convex when the set is!

1

0

3