Ben Grimmer Profile Banner
Ben Grimmer Profile
Ben Grimmer

@prof_grimmer

2,750
Followers
452
Following
87
Media
284
Statuses

Assistant Professor @JohnsHopkinsAMS , Optimization, PhD @Cornell_ORIE Mostly here to share pretty maths/3D prints, sometimes sharing my research

Baltimore, MD
Joined January 2015
Don't wanna be here? Send us removal request.
@prof_grimmer
Ben Grimmer
1 year
I've proven the strangest result of my career.. The classic idea that gradient descent's rate is best with constant stepsizes 1/L is wrong. The idea that we need stepsizes in (0,2/L) for convergence is wrong. Periodic long steps are better, provably.
Tweet media one
71
604
4K
@prof_grimmer
Ben Grimmer
1 year
The new strangest results of my career (with Kevin Shu and Alex Wang). Gradient descent can accelerate (in big-O!) by just periodically taking longer steps. No momentum needed to beat O(1/T) in smooth convex opt! Paper: [1/3]
Tweet media one
11
93
708
@prof_grimmer
Ben Grimmer
2 years
Perfect shelf for my collection of induced matrix norm unit balls. The individual stand colors denote whether the norm can be computed in polynomial time, is NP-hard, or is an open question. Details:
Tweet media one
4
38
322
@prof_grimmer
Ben Grimmer
1 year
We can do better than this state-of-the-art constant stepsize guarantee with a tiny adjustment, just alternate between a long step (longer than we can guarantee descent for) and a short step. Just alternating 2.9, 1.5, 2.9, 1.5, ... is provably 10% faster than prior best known!!
Tweet media one
3
13
255
@prof_grimmer
Ben Grimmer
1 year
A nice mathematical puzzle🧐 If you take a 4-norm ball and cut it carefully, you will find a two-norm ball. 3D printed visual evidence below. The puzzle: Why does this happen and how much more generally does it happen? (This question was first posed to me by Pablo Parrilo)
Tweet media one
9
23
204
@prof_grimmer
Ben Grimmer
1 year
I believe every optimization method should be understood both in primal and dual terms. I was annoyed when teaching basic subgradient methods that I did not have a clear dual story to tell my students. Cue a paper giving this missing equivalent dual story
2
38
203
@prof_grimmer
Ben Grimmer
1 year
This effect even manifests in actual numerics. In some sampled least squares problems, we see these faster rates from our theory occur perfectly Thanks for reading! I'm still trying to get my head fully around this long-step acceleration, so I hope I've written clear enough here
Tweet media one
5
8
204
@prof_grimmer
Ben Grimmer
2 years
My 3D printer has been working overtime this summer printing every norm ball I can think of. At the moment I have 70 interesting norm balls. Below are perhaps the most common norms: p-norm balls with dual pairs (1,inf), (4/3,4), (3/2,3), (2,2) Details:
Tweet media one
2
29
186
@prof_grimmer
Ben Grimmer
1 year
Cycling through a pattern long than length two gives increasingly big payoffs! ! A cycle of length 127, periodically taking a long step of length 370, gets a rate almost 3x faster than prior best!! The proofs use neat computer-assisted semidefinite programming; see the paper :)
Tweet media one
5
8
191
@prof_grimmer
Ben Grimmer
2 years
In minimax optimization, first-order methods can locally converge to limit points, cycle, or (if very lucky) globally converge (plots below shows these for the proximal point method). Some of my work on these behaviors just appeared in Math Programming ().
Tweet media one
1
14
84
@prof_grimmer
Ben Grimmer
1 year
Since tweeting about faster gradient descent via long steps, I learned of an exceptional prior work needing credit: Jason Altschuler's Master's thesis (advised by Pablo Parrilo) broke ground here 5 years ago! ! For smooth strongly convex optimization, he gave optimal faster
1
8
82
@prof_grimmer
Ben Grimmer
2 years
How to (optimally) design a fair/robust syllabus for optimization classes: Below is the polyhedron of all rubrics I find reasonable, which I then maximize for each student. As a result, evaluation focuses on their individual areas of excellence. Details:
Tweet media one
6
8
70
@prof_grimmer
Ben Grimmer
2 years
In preparation for teaching the simplex method, I needed a Klee-Minty Cube showing exponentially many pivots all improving the LP objective value. The path shown below is entirely increasing in the direction c=(4,10,25) and visits all 2^3 corners.
Tweet media one
3
5
62
@prof_grimmer
Ben Grimmer
1 year
I had the privilege to talk about my recent work improving gradient descent's theory at MIT last week. If you'd like to see my first go at presenting this work, Thank you @shuvo_das_gupta for organizing (and for your numerical work that motivated mine)!
@shuvo_das_gupta
Shuvomoy Das Gupta
1 year
Prof. Ben Grimmer ( @prof_grimmer ) gave a virtual talk at MIT Operations Research Center last Friday about his recent work on faster gradient descent via long steps, which was featured in @QuantaMagazine . The YouTube link to his talk is here: !
0
8
53
1
5
51
@prof_grimmer
Ben Grimmer
2 years
Recently discovered a record of my earliest endeavors in optimization, circa 2003
Tweet media one
4
0
48
@prof_grimmer
Ben Grimmer
2 years
By far my biggest 3D printing project: 49 operator norm balls! Below are induced l_p -> l_q matrix norm balls with 7 values of p, q. The base color indicates whether computing the norm is polynomial (green), NP-Hard (red/orange) or open (white). Details:
Tweet media one
1
5
45
@prof_grimmer
Ben Grimmer
2 years
My adviser Jim Renegar showed in me the amazing unifying power of viewing optimization as convex cones: LPs as Nonnegative Orthant, Second-Order & Semidefinite Prog namesakes, and NP-Hardness as Copositives Below are the models I use to teach this Details
Tweet media one
1
5
45
@prof_grimmer
Ben Grimmer
1 year
Doing something new today, an art piece (my first) is on exhibition at the at the Bridges conference It's a shelf with 49 induced matrix norm balls, showing their difficult computational complexities Details: Full gallery:
Tweet media one
0
3
45
@prof_grimmer
Ben Grimmer
1 year
Recent methods for nonconvex, nonsmooth Lipschitz minimization attain "Goldstein" stationarity: an average of nearby gradients is small In a short(!) paper @jiazhichao4 and I extend this to Lipschitz constraints: averaging nearby gradients satisfies KKT!
0
4
43
@prof_grimmer
Ben Grimmer
1 year
@octonion I had my computer brute force search through a lot of possible patterns h. Most failed. May just be my proof technique being brittle but the patterns I give in the paper would fail to have a descent lemma if you permute the order at all. They seem very special.
1
1
40
@prof_grimmer
Ben Grimmer
2 years
The journal version of one of my results PhD just appeared! If you prove a guarantee for your minimization algorithm under strong convexity, a guarantee under just convexity (often) follows right away (no changing or perturbing the method needed). 🧵1/2
2
2
36
@prof_grimmer
Ben Grimmer
1 year
@loiseau_jc I'm no expert there, but I think Yes. The main previous result about long step gradient descent was from Young in 1953, who showed taking stepsizes based on the Chebyshev polynomial roots optimally accelerated convergence on quadratics. His argument was exactly about this
1
0
35
@prof_grimmer
Ben Grimmer
2 years
Just finished printing my students' final project in my "Freshman Experience in Applied Math" course! They wrapped a hexagonal grid's shadow perfectly onto a box @JohnsHopkinsAMS @HopkinsEngineer @JohnsHopkins
Tweet media one
2
5
32
@prof_grimmer
Ben Grimmer
1 year
I'll be talking this afternoon about Probably Faster Gradient Descent via Long Steps @MOPTA_ORMS . (If you aren't here or can't make it, a previous recording is online at )
1
8
33
@prof_grimmer
Ben Grimmer
2 years
How hard is optimizing over a smooth and/or strongly convex set? If you only know how to check set feasibility and compute normal vectors, you get the same optimal rates as minimizing strongly convex/smooth functions My work on this just appeared on arvix
1
4
31
@prof_grimmer
Ben Grimmer
11 months
DeepMath2023 is going on right now at Hopkins! Live stream for the rest of the world below:
0
6
31
@prof_grimmer
Ben Grimmer
2 years
Often seeking sparse solutions to optimization problems leads to adding nonconvex regularization. The family photo below gives some intuition for this effect from Smoothly Clipped Absolute Deviation (SCAD) regularization constraint sets {x | s(x)<p} for varied p
Tweet media one
3
2
30
@prof_grimmer
Ben Grimmer
2 years
A nontrivial way to build norms in d dimensions: given any vector, compute a k<d dimension vector out of its top k magnitude entries and then use any k dimensional norm. Below are the 3d norms given by all the 2d p-norms this way. (As p->infty, we just get the infty norm ball)
Tweet media one
1
1
28
@prof_grimmer
Ben Grimmer
1 year
A while back, I 3D printed some convex cones as teaching props ( ) I'm looking for more fun ones to add to my collection, seeking recommendations :) My current "todo" list: completely positive, exponential, and sum-of-squares cones (and their duals)
Tweet media one
1
0
30
@prof_grimmer
Ben Grimmer
2 years
Just posted my newest work: How hard is it to minimize a sum of varied smooth and nonsmooth terms? Surprisingly, the optimal rate is the sum of each term's individual settings rate!! Moreover, it is attained universally by an algorithm of Nesterov.
1
3
29
@prof_grimmer
Ben Grimmer
1 year
Previous works had it go brrrrrr but I proved it's better to go BrrRrrRRRrrrrRrrr
1
2
28
@prof_grimmer
Ben Grimmer
1 year
This area rapidly evolving! Jason Altschuler and Pablo Parrilo posted long-step acceleration results for strongly convex problems just last Friday! We are cracking this problem one small (or long) step at a time :) [3/3]
Tweet media one
2
1
28
@prof_grimmer
Ben Grimmer
2 years
If you thought p-norm balls were cool, wait til you see their matrix version... Below are unit Schatten p-norm balls over 2x2 symmetric matrices. Common norms: p=2 is Frobenius, p=inf is Operator Norm, p=1 is Nuclear Norm capturing low-rankness Details:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
2
26
@prof_grimmer
Ben Grimmer
2 years
Just finished printing & distributing several Klee-Minty Cubes throughout my department for optimization classes (and fun) :) If you'd like one, most university libraries can print them with my files below:
Tweet media one
@prof_grimmer
Ben Grimmer
2 years
In preparation for teaching the simplex method, I needed a Klee-Minty Cube showing exponentially many pivots all improving the LP objective value. The path shown below is entirely increasing in the direction c=(4,10,25) and visits all 2^3 corners.
Tweet media one
3
5
62
1
3
26
@prof_grimmer
Ben Grimmer
1 year
On my way to give a seminar tomorrow at @TippieAnalytics . Will be talking about new projection-free "Radial" methods. Example constraint set props are packed and ready to go :) (Plenty of polyhedrons, spectrahedrons, norm balls, and sparsity inducing regularizers)
Tweet media one
1
0
23
@prof_grimmer
Ben Grimmer
1 year
The key is stepsizes MUCH larger than 2/L, the classic limit! Ours are built up from exponentially long patterns with exponentially large average stepsizes and a neat fractal shape. (See the paper for their proper construction.) [2/3]
Tweet media one
3
1
24
@prof_grimmer
Ben Grimmer
1 year
A two part series of papers I wrote towards the end of my PhD just appeared in Math Programming :) My first ideas towards these papers were in Spring 2017, so they've been >6years in the making. Part I: Part II:
2
2
24
@prof_grimmer
Ben Grimmer
2 years
Math Programming just announced Meritorious Service Awards for "timely, judicious, and considerate reviews". I wanted to thank the many judicious and considerate referees I have had. Some day I hope to have a timely one :) (disclosure I am on their list)
0
1
24
@prof_grimmer
Ben Grimmer
2 years
Packed all the important things for my talk @INFORMS2022 tomorrow :)
Tweet media one
Tweet media two
1
1
22
@prof_grimmer
Ben Grimmer
1 year
@konstmish Unreasonable is the perfect word. I needed a computer assisted proof technique since it went beyond my ability to reason xD
1
0
19
@prof_grimmer
Ben Grimmer
2 years
Have you ever wondered what the Max-Cut Polytope, its semidefinite outer approx, and its trigonometric inner approx look like? For 3x3 matrices, it lives in 3D and is shown below. For 4x4 matrices, it lives in 6D and some slices are shown below. Details:
Tweet media one
Tweet media two
Tweet media three
2
2
20
@prof_grimmer
Ben Grimmer
2 years
Musing on Dual Norms: the most classic pair (IMHO) is L_1 and L_inf. 2D does not do duality justice, as they look like rotations of each other. Taking a dual interchanges the roles of points and faces. Below L_1 has 8 faces and 6 points aligned with L_inf's 8 points and 6 faces.
Tweet media one
1
0
20
@prof_grimmer
Ben Grimmer
2 years
As further reference objects littering my office... I've 3D printed several common nonconvex regularizers: q-norm (q<1) and SCAD regularization induce sparsity, Schatten q-norm regularization inducing low-rankness. Documented at:
Tweet media one
Tweet media two
Tweet media three
0
2
16
@prof_grimmer
Ben Grimmer
2 years
Are you tired of your board games being stored unevenly due to their varied shapes? Just build yourself an oppositely uneven shelf to get nice level tops when stored in exactly the right order...
Tweet media one
1
1
16
@prof_grimmer
Ben Grimmer
1 year
@WhoAreYouMate I have mixed intuitions (but that means its a really good research question). Once in the neighborhood of a local min, I'd expect this speeds up convergence. Global bounds less clear. But... accelerated momentum methods don't generalize well, so there is also precedent to fail.
0
0
17
@prof_grimmer
Ben Grimmer
2 years
Saturday's Project #lego #botanicalart
Tweet media one
1
0
15
@prof_grimmer
Ben Grimmer
1 year
Deepmath2023 will be at JHU! Please consider submitting an abstract (deadline extended to 6/15)! . We encourage submissions from diverse disciplines including Statistics, Physics, Computer science, Neuroscience, Mathematics, Psychology, and Engineering.
2
4
15
@prof_grimmer
Ben Grimmer
11 months
Printing a new family of norm balls: Function Norms! These ones are strange; first one just finished! Below is p=4-function unit ball ||f||_4^4=\int_{-1}^1 |f(x)|^4dx<=1 (Okay, functions give too big of a space so I am just looking at quadratics in the Legendre polynomial basis)
Tweet media one
Tweet media two
1
0
15
@prof_grimmer
Ben Grimmer
1 year
@chanwoopark20 I don't have a good intuition, my main source of inspiration was the BnB-PEP paper which also had these outlier long steps. @ErnestRyu , @shuvo_das_gupta 's paper was a huge inspiration for my work. I love it. My theory results couldn't match their numerics, though. The rates I
1
0
14
@prof_grimmer
Ben Grimmer
1 year
@roydanroy @bremen79 Just doing deterministic schedules repeating the same pattern. I think the proof would naturally extend to backtracking ideas of adaptivity (check if sufficient descent came from doing the pattern, else cut the estimated smoothnes constant in half). Other adaptive ideas would
0
0
14
@prof_grimmer
Ben Grimmer
2 years
The 2norm ball (brought from 2d up into 3d) has a delightful property: It rolls!!
2
1
11
@prof_grimmer
Ben Grimmer
1 year
Bumping this old tweet since its timely... Consider grading your next optimization class via linear programming :) It can be more flexible and forgiving than a fixed rubric. Over two years, I've yet to have a student complain (they like having their score maximized)
@prof_grimmer
Ben Grimmer
2 years
How to (optimally) design a fair/robust syllabus for optimization classes: Below is the polyhedron of all rubrics I find reasonable, which I then maximize for each student. As a result, evaluation focuses on their individual areas of excellence. Details:
Tweet media one
6
8
70
2
0
10
@prof_grimmer
Ben Grimmer
2 years
If you are in Baltimore or Vancouver, this week I am giving seminars at Johns Hopkins (Thursday) and UBC (Friday) on my recent work developing new scalable projection-free optimization methods. (If you aren't and still want to listen in, DM me and I can share zoom links.)
1
0
10
@prof_grimmer
Ben Grimmer
2 years
In preparation for the Fall, printing a full sized 9"x9" grading polyhedron. Hopefully it's big enough to be seen from the back of the room
0
0
10
@prof_grimmer
Ben Grimmer
1 year
To give a good primal-dual story, we didn't just want primal and dual algorithm descriptions, but also primal and dual convergence rates. We can measure those as the iterates converging down to optimal and the model converging up to optimal.
Tweet media one
1
0
10
@prof_grimmer
Ben Grimmer
2 years
I'll be giving a talk on "Radial Duality: Scalable Projection-Free Optimization" at @iccopt2022 this afternoon. Rauch 201 at 3:50pm #iccopt2022 #optimization PS: Some 3D printed references will be used as figures.
0
0
8
@prof_grimmer
Ben Grimmer
1 year
@j_bertolotti @adad8m That's delightful!! You might want to select p values nonuniformly to capture the duality equally. For each p>2, you could include 1/(1-1/p) as it's dual norm in the time with p<2.
0
0
10
@prof_grimmer
Ben Grimmer
1 year
The one line description: Constrained optimization problems have another dual, given by an unconstrained gauge optimization problem. Based on this "radial duality", we get radial optimization algorithms that avoid expensive projections and may outscale IPM and ADMM-style methods
Tweet media one
0
1
8
@prof_grimmer
Ben Grimmer
2 years
Just wrapped up a great semester's "Freshman Experience in Applied Math" course with some fantastic Hopkins students, below showing off their creation. Their writeup: A 3D model viewer and their design files:
Tweet media one
0
1
10
@prof_grimmer
Ben Grimmer
1 year
And the grand finale, the convergence theory is (mostly) symmetric as it should be! Both primal and dual convergence for the subgradient method happens at the same rate!!!! (See the paper for the more general nonLipschitz, stochastic, proximal, functional constrained results).
Tweet media one
0
1
9
@prof_grimmer
Ben Grimmer
1 year
Dyck paths are non-decreasing lattice walks from (1,1) to (k,k), a combinatorial classic. Andrew Daw @USCMarshall recently told me about a nice polytope they index in one of his papers () He'll get the 3D print below at @INFORMS2023 :)
Tweet media one
Tweet media two
Tweet media three
0
2
8
@prof_grimmer
Ben Grimmer
2 years
Just printed another norm ball for my collection. Guess a formula for what norm generates this shape Hint(?): this is called a cuboctahedron, whose dual/polar is a rhombic dodecahedron.
Tweet media one
3
0
8
@prof_grimmer
Ben Grimmer
10 months
Latest Hopkins Engineering Magazine has an article on visualizations of difficult scientific concepts. It's a nice read; a few of my 3D printed operator norm balls appear at the end
0
0
10
@prof_grimmer
Ben Grimmer
2 years
When managing risk one often quantifies how bad the worst events are. Conditional Value at Risk (CVaR) measures the expected loss in the worst q% of events. This can be viewed as the 1-norm of q% largest magnitude entries. This is a norm!! CVaR Unit Ball is below in 3D with q=66%
Tweet media one
1
0
7
@prof_grimmer
Ben Grimmer
1 year
@amitra_90 @konstmish Momentum methods still give better rates given perfect first-order oracles (in the regular convex optimization world at least). Noisy/inexact/stochastic settings are exactly where gradient descent can pull ahead of Nesterov's accelerated method, so you are exactly pointing to a
1
0
7
@prof_grimmer
Ben Grimmer
1 year
@SAnsumali I don't know what the best pattern is, but I was able to prove patterns of increasing length (up to t=127) get increasing gains. The alternating example is just showing gains in the simplest case of cycling through t=2 sizes. The best one I found (surely not universally best) is
2
0
7
@prof_grimmer
Ben Grimmer
2 years
@GregTrayling @ProfKinyon I've got baby version that prints in a few hours (all that plastic in the big shelf took ~50 days)
Tweet media one
0
0
7
@prof_grimmer
Ben Grimmer
2 years
The dual norms of these look even more strange...
Tweet media one
1
0
6
@prof_grimmer
Ben Grimmer
1 year
@RahulAPanicker That's a good path! There are very nice ways to write 2-norms and 4-norms as SDPs...
1
0
6
@prof_grimmer
Ben Grimmer
1 year
@aryehazan It works exactly for 4
0
0
6
@prof_grimmer
Ben Grimmer
1 year
@GuilleAngeris I would love one! I conjecture in the paper that cycle of pattern of length t should be able to at least speed things up by log(t), based on my computer certificates up to solving 129x129 SDPs Any path I see to prove that would likely need to do away with the computer assistance
1
0
6
@prof_grimmer
Ben Grimmer
1 year
@McFunkypants @QuantaMagazine @parshallison It is fantastic! Much much more pretty than the plot from my paper (but still evoking a connection, for those off the deep end, reading the study, with the same smooth curve over stairstep curve)
Tweet media one
0
0
5
@prof_grimmer
Ben Grimmer
1 year
@XiongZeng111 Absolutely! Cutting perpedicular to any of the standard basis directions (including horizontally) gives a 4-norm one dimension lower. So the cut of a 4-norm that generates 2-norms must be different...
0
0
6
@prof_grimmer
Ben Grimmer
2 years
@thserra It makes for a great lecture one, combining syllabus review with a natural example of optimization (that the students deeply care about). It's also small enough to explicitly do the dual. My example grading in that writeup sows the seeds for attacking the dual grading program
1
0
5
@prof_grimmer
Ben Grimmer
2 years
How much longer does it take to 3D print a model after doubling it's size in each dimension? (hint: printers infill interiors, as shown in the cube print below, so the whole volume need not be printed)
Tweet media one
2
0
5
@prof_grimmer
Ben Grimmer
1 year
@colin_fraser An intuition is hard, I needed a computer-aided proof technique, so there is some mystery resulting. If pressed to give a gut-feeling, one is that the "bad case" for a long step is if the objective looked like "U" or check mark shape where we overshoot. If that was the case, we
1
0
5
@prof_grimmer
Ben Grimmer
2 years
Designing new algorithms minimizing problems reformulated in terms of gauges squared, we get some promising numerical results compared to standard solvers! (Admittedly, these preliminary plots are for somewhat simple synthetic problems, realistic validation is still needed.)
Tweet media one
0
0
5
@prof_grimmer
Ben Grimmer
1 year
The subgradient method is a classic, here we are thinking strongly convex minimization. Repeatedly linearized (below) your objective function and then move downhill. (The paper above also allows for proximal terms and functional constraints via switching, but lets keep it simple)
Tweet media one
1
0
5
@prof_grimmer
Ben Grimmer
2 years
@ben_golub Wlog x is the origin. Then the boundary point you are looking at is b(v) = v/MinkowskiGauge(v). The Minkowski gauge is 1/R-Lipchitz continuous where R is the distance to the sets bdry. Your mapping b will inherit this except near the origin.
0
0
5
@prof_grimmer
Ben Grimmer
1 year
The dual perspective on this follows from Dual Averaging (Nesterov 2005), below specialized and extended for strongly convex settings. At each iteration take a weighted average of our lower bounds to model the objective, minimize the model (plus regularization).
Tweet media one
1
0
4
@prof_grimmer
Ben Grimmer
1 year
@ErnestRyu @shuvo_das_gupta If I may advertise your result :P , your solver is amazing and the software under the hood is fantastically useful and well documented! It's so good I put it in the acknowledgement at the end of my paper! ! Anyone here wanting to dive into PEP proofs should read yours!
0
0
4
@prof_grimmer
Ben Grimmer
1 year
@sjaak_blom Just gradient descent, x_{k+1}=x_k - (h_k/L)\nabla f(x_k), like Cauchy has done for 150+ years. No modern bells or whistles like those stochastic variants, those would be interesting to see
1
0
3
@prof_grimmer
Ben Grimmer
2 years
@damekdavis I'm thinking it is arbitrarily slow (under one interpretation). Pick any sequence of pos integers {a_i}. Let x_0=1 with x_{k+1}=x_k/2 whenever k\in {a_i}, else keep it constant x_{k+1}=x_k. Picking a_i growing arbitrarily fast makes |x_i-x_{i+1}|'s convergence arbitrarily slow.
0
0
4
@prof_grimmer
Ben Grimmer
1 year
@netw0rkf10w @konstmish There are several papers going back to Young in 1953 that get long step speed ups for quadratic minimization. My writeup is missing this most recent one, so the next arxiv post will add it in :) . Thank you!! The proof tools from that line don't seem to generalize to general
0
0
3
@prof_grimmer
Ben Grimmer
2 years
It's not eight times, scaling with volume, as most of the interior is not printed (that is, the n^3 coefficient is small) It's primarily four times, scaling with surface area, as the thick boundary dominates the print time If you are printing fractals, the answer is more fun...
1
0
4
@prof_grimmer
Ben Grimmer
1 year
@sp_monte_carlo @typedfemale Its a lot like Chebyshev methods, I expcet similar brittleness. I really wanted to use the spectral perspective analysis from that line of work, but since smooth convex functiosn can have changing Hessians, I couldn't find a footing. Ended having a SDP do the hard spectral work.
1
0
4
@prof_grimmer
Ben Grimmer
2 years
The results work for nonsmooth, smooth or Holder smooth problems and under any Holder growth (generalizing strong convexity); Using them on bundle methods and dual averaging where generalizing existing proofs get tricky gave new guarantees without substantial pain :) 🧵2/2
0
0
4
@prof_grimmer
Ben Grimmer
2 years
Hello World! Made this account to share pretty maths I have been/am 3D printing. Will have some pretty objects to share soon!
Tweet media one
0
0
4
@prof_grimmer
Ben Grimmer
2 years
Pop Quiz: Name that Spectrahedron!
Tweet media one
1
0
4
@prof_grimmer
Ben Grimmer
1 year
@adad8m you don't even need c there :)
1
0
4
@prof_grimmer
Ben Grimmer
1 year
@realShengchao @gabrielpeyre The best theorem to my knowledge is the LD^2/2T rate nound for constant size one. Of course that theorem doesn't prove it's actually the best. Taylor et al conjectured in 2017 what the optimal choice between one and two would be if T is given ahead of time. I don't know of a
0
0
2
@prof_grimmer
Ben Grimmer
2 years
@timoberthold Please steal freely (I'll also make/post a typo free copy this afternoon)! Eventually I'll have some 3D printed and will share the source so any can get a physical copy (from their university library)
0
0
3
@prof_grimmer
Ben Grimmer
1 year
@xray_matt I mean the second after my conjecture, but I agree it's ambiguous as written. Parentheses will appear in the next version :)
0
0
2
@prof_grimmer
Ben Grimmer
1 year
@damekdavis This looks very cool! Long steps are in fashion!
1
0
3
@prof_grimmer
Ben Grimmer
1 year
@gabrielpeyre If you'd like to 3D print a set of these
0
0
3
@prof_grimmer
Ben Grimmer
11 months
@bremen79 @damekdavis The method takes as input the target accuracy eps, you may call that a parameter. It also has a constant under the hood that is adapting from continual linesearching, you may call that a parameter. I'm not inclined to call either of those parameters, so my mental lexicon says Yes
1
0
3
@prof_grimmer
Ben Grimmer
2 years
The gauge of a set is the function whose level sets are scaled copies of the set. Evaluating a set's gauge and its gradient is often cheap (comparable to checking membership and computing a normal vector). We show the gauge squared is smooth/strongly convex when the set is!
Tweet media one
1
0
3