wp-plugin-hostgator
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/scienrds/scienceandnerds/wp-includes/functions.php on line 6114ol-scrapes
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/scienrds/scienceandnerds/wp-includes/functions.php on line 6114Source:https:\/\/www.quantamagazine.org\/risky-giant-steps-can-solve-optimization-problems-faster-20230811\/#comments<\/a><\/br> \u201cIt turns out that we did not have full understanding\u201d of the theory behind gradient descent, said Shuvomoy Das Gupta<\/a>, an optimization researcher at the Massachusetts Institute of Technology. Now, he said, we\u2019re \u201ccloser to understanding what gradient descent is doing.\u201d<\/p>\n The technique itself is deceptively simple. It uses something called a cost function, which looks like a smooth, curved line meandering up and down across a graph. For any point on that line, the height represents cost in some way \u2014 how much time, energy or error the operation will incur when tuned to a specific setting. The higher the point, the farther from ideal the system is. Naturally, you want to find the lowest point on this line, where the cost is smallest.<\/p>\n Gradient descent algorithms feel their way to the bottom by picking a point and calculating the slope (or gradient) of the curve around it, then moving in the direction where the slope is steepest. Imagine this as feeling your way down a mountain in the dark. You may not know exactly where to move, how long you\u2019ll have to hike or how close to sea level you will ultimately get, but if you head down the sharpest descent, you should eventually arrive at the lowest point in the area.<\/p>\n Unlike the metaphorical mountaineer, optimization researchers can program their gradient descent algorithms to take steps of any size. Giant leaps are tempting but also risky, as they could overshoot the answer. Instead, the field\u2019s conventional wisdom for decades has been to take baby steps. In gradient descent equations, this means a step size no bigger than 2, though no one could prove that smaller step sizes were always better.<\/p>\n With advances in computer-aided proof techniques, optimization theorists have begun testing more extreme techniques. In one study, first posted<\/a> in 2022 and recently published<\/a> in Mathematical Programming<\/em>, Das Gupta and others tasked a computer with finding the best step lengths for an algorithm restricted to running only 50 steps \u2014 a sort of meta-optimization problem, since it was trying to optimize optimization. They found that the most optimal 50 steps varied significantly in length, with one step in the middle of the sequence reaching nearly to length 37, far above the typical cap of length 2.<\/p>\n The findings suggested that optimization researchers had missed something. Intrigued, Grimmer sought to turn Das Gupta\u2019s numerical results into a more general theorem. To get past an arbitrary cap of 50 steps, Grimmer explored what the optimal step lengths would be for a sequence that could repeat, getting closer to the optimal answer with each repetition. He ran the computer through millions of permutations of step length sequences, helping to find those that converged on the answer the fastest.<\/p>\n Grimmer found that the fastest sequences always had one thing in common: The middle step was always a big one. Its size depended on the number of steps in the repeating sequence. For a three-step sequence, the big step had length 4.9. For a 15-step sequence, the algorithm recommended one step of length 29.7. And for a 127-step sequence, the longest one tested, the big central leap was a whopping 370. At first that sounds like an absurdly large number, Grimmer said, but there were enough total steps to make up for that giant leap, so even if you blew past the bottom, you could still make it back quickly. His paper showed that this sequence can arrive at the optimal point nearly three times faster than it would by taking constant baby steps. \u201cSometimes, you should really overcommit,\u201d he said.<\/p>\n This cyclical approach represents a different way of thinking of gradient descent, said Aymeric Dieuleveut<\/a>, an optimization researcher at the \u00c9cole Polytechnique in Palaiseau, France. \u201cThis intuition, that I should think not step by step, but as a number of steps consecutively \u2014 I think this is something that many people ignore,\u201d he said. \u201cIt\u2019s not the way it\u2019s taught.\u201d (Grimmer notes that this reframing was also proposed<\/a> for a similar class of problems in a 2018 master\u2019s thesis by Jason Altschuler, an optimization researcher now at the University of Pennsylvania.)<\/p>\n However, while these insights may change how researchers think about gradient descent, they likely won\u2019t change how the technique is currently used. Grimmer\u2019s paper focused only on smooth functions, which have no sharp kinks, and convex functions, which are shaped like a bowl and only have one optimal value at the bottom. These kinds of functions are fundamental to theory but less relevant in practice; the optimization programs machine learning researchers use are usually much more complicated. These require versions of gradient descent that have \u201cso many bells and whistles, and so many nuances,\u201d Grimmer said.<\/p>\n
\nRisky Giant Steps Can Solve Optimization Problems Faster<\/br>
\n2023-08-14 21:58:41<\/br><\/p>\n