Below you see a bunch of points,
and some lines jumping around.
The computer is trying to fit a straight line
to the points as closely as possible.
To do so, it’s using a technique called simulated annealing.
In this post I show you
what simulated annealing is,
what problems it can solve,
and how to implement it.
If you read the Wikipedia page on simulated annealing,
you’ll find a bunch of physics mumbo-jumbo,
claiming that the algorithm is a simulation of how metals behave as they cool.
In reality,
“simulated annealing” is a just a variation of “trial-and-error”,
a technique you learned in school!
In trial-and-error,
you repeatedly generate guesses
until you find the solution,
or until you find a guess that’s good enough.
Here’s one trial-and-error algorithm in JavaScript:
let guess =rand();while(loss(guess)>0)
guess =rand();
Simulated annealing improves on trial-and-error by
Generating the next guess
by mutating a previous one.
Jumping around more at the beginning,
and less as you come to the end.
Click the “Run it again” button above,
and you can see these two ideas in action.
The red line is the best guess so far;
the blue line is the current guess.
The blue lines are mutations of the red one,
and gradually, the mutations become less radical.
Here’s the algorithm in JavaScript:
// I chose zero as my initial guess// (but you could start anywhere).let best =[0,0,...];for(let temp =100; temp >0.1; temp *=0.95){// Mutation. I've chosen to mutate more with temp.
current = best +rand()*temp;if(loss(current)<loss(best)){// Could accept this probabilistically with temp.
best = current;}else{// Could probabilistically move anyway.}}
The key idea in annealing is
“jumping around more at the beginning”.
This idea helps if similar guesses tend to have similar error.
This is a property of most functions we want to optimize!
It includes, for example, all continuous functions.
(One thing that annealing shouldn’t help for
is breaking a cryptographic hash function,
where “a miss is as good as a mile”.)
If similar guesses tend to have similar error,
we wish to seek out areas of generally low error.
Near the beginning,
we have no reason to think that we are in such an area,
so we should jump around more.
There are at least two ways to implement this
“jumping around more near the beginning”:
Be more likely to accept the new guess even if it’s worse.
This will cause you to jump around more
because your travel will be a random walk.
This technique seems to be the traditional annealing algorithm.
Apply more mutation to the previous guess.
This will cause you to jump around more
because your jumps can skip directly over barriers.
This technique is apparently called “quantum annealing”
(warning: another Wikipedia page of excessive physics).
In the algorithm above,
temp is the “temperature”,
nomenclature from the traditional “annealing” analogy.
A better name might be jumpiness:
it controls how much to jump around.
As such, it gradually decreases with time.
Just like physical temperature,
I decided to reduce temp using exponential decay:
temp *= 0.95.
But many variations exist,
and you shouldn’t get hung up on it.
Want to build a fantastic product using LLMs? I work at Granola where we're building the future IDE for knowledge work. Come and work with us! Read more or get in touch!