Thanks to danske for this:

First
off we need to expand the explanation of the Taylor
series.

There are more terms to the Taylor series than
in: (x(t0 + h) = x(t0) + h x'(t0) + h^2/2 x''(t0) + O(h^3)). It
continues off in an infinite series:

x(t0 +
h) = x(t0) + x'(t0)h + x''(t0)/2 h^2 + x'''(t0)/6 h^3 +
x''''(t0)/24 h^4 + ... + x('n)(t0)/n! h^n + error

Err...I'm
going to write this quickly, so it may not be comprehensive or
appropriately split-up for those two pages.

If
we're integrating over time, as in a simulation, then we need to
be able to calculate the derivative over time for each variable
we're attempting to integrate for. For position this derivative
is velocity, for velocity this derivative is acceleration. If we
could directly calculate the derivative for acceleration than we
could use this as well to predict future acceleration. In a
physical simulation acceleration (from forces) typically depend
on the positions and velocities of all the objects in the
system, so we don't have an equation for how acceleration
(forces) are changing over time. If we did that would be like
having an equation which would tell us that two objects will
collide in the simulation before we actually get to that time.
This is, however, what we will attempt to numerically
emulate.

Rewriting the Taylor series for
physical quantities (and in the order usually used in physics
texts) we have:

x(t0 + h) = 0.5a(t0)h^2 +
v(t0)h + x(t0)

(the equation for position given
constant acceleration)

v(t0 + h) = a(t0)h + v(t0)

(the
equation for velocity given constant acceleration)

if
we had the derivative of acceleration then we could instead
have:

x(t0 + h) = 1/6 * a'(t0)h^3 +
0.5a(t0)h^2 + v(t0)h + x(t0)

v(t0 + h) = 0.5a'(t0)h^2 +
a(t0)h + v(t0)

a(t0 + h) = a'(t0)h + a(t0)

We
can estimate a' using our simulation, however. We can calculate
the forces in the simulation given positions and velocities off
the objects. If only we knew the positions and velocities of the
objects in the future we could know the forces in the future.
Wait a second, computing positions and velocities of the future
is what this simulation is doing. We can use it!

How
does this work? We can use our current forces and accelerations
to provide us with a prediction of the positions and velocities
of all the objects (h) in the future, calculate the forces and
accelerations in our predictions, and then use those future
accelerations to estimate (a') at (t0). We can then use that
estimated (a') to add another term to our Taylor series,
improving our estimation of future positions and accelerations,
repeating the process until the difference between our
prediction and corrected prediction is small.

Here's
what it looks like mathematically:

a(t0 + h)
= a'(t0)h + a(t0)

a'(t0) = (a(t0 + h) -
a(t0))/h

(you'll recognize that as the definition of
the derivative if h -> 0)

Instead of using
the "actual" value of (a(t0 + h)) we use our predicted
value of the future.

a'(t0) = (ap(t0 + h) -
a(t0))/h

If we put that back into our Taylor
series (omitting some of the algebra) and calling our new values
"corrected", we have, :

xc(t0 + h)
= 0.25*(ap(t0 + h) + a(t0))h^2 + v(t0)h + x(t0)

vc(t0 +
h) = 0.5*(ap(t0 + h) + a(t0))h + v(t0)

So
instead of using simply (a(t0)) in the Taylor series without
(a'(t0)), we replace it with the average of the current
acceleration and our predicted accelerations in the future
((a(t0) + ap(t0))/2).

In terms of the
simulation what we're doing is generating the world one time
step in the future as if all the forces were constant during
that time step, evaluating the forces in the future, and then
using those future values to compute new average forces to use
over the timestep. If the forces aren't constant over time then
our new average forces will result in a more accurate simulation
than pretending the forces are constant during each
timestep.

As a concrete example let us
consider a spring being compressed over time. At (t0) it is
producing some force, at (t0 + h) it is producing a greater
force. If we integrate position over time with the initial force
then we've underestimated the force of the spring during that
time step. By "peeking" ahead we see that its force is
increasing and that we should use a greater value as an average
force over the time step. Doing so gives us a more accurate
simulation of the forces from the spring.

In
another example which may be more intuitive, consider a car
driving off a cliff sometime in the middle of the time step. At
(t0) it is still on the ground. At (t0 + h) we find it hanging
in mid air. Somewhere in the middle of the timestep the force
from the ground when from the weight of the car to zero.
Obviously the car has been experiencing an average acceleration
somewhere between zero and -9.8m/s^2 over the timestep and our
use of a zero acceleration is inaccurate. So we recalculate the
future position and velocity of the car at (t0 + h) using an
average acceleration of -4.9m/s^2 instead of zero. This
simulates the car having driven off the cliff exactly half-way
through the time step. Unless we want to instead decrease the
time step to more accurately determine when the car went off the
cliff then this result of simulating it going off half-way
between when we last saw it on the ground and when we first
found it hanging in mid-air, this is better than simulating it
having just driven off the cliff an infinitesimal amount of time
before we first found it hanging in the air. On average it will
have driven off in the middle of the timestep, which is what
using the average acceleration of -4.9m/s^2 over the timestep
gives us.

In coding this Modifier Euler
method we do need to maintain at least two whole world states,
the initial state at (t0) and our predicted state at (t0 + h),
in order to compute the average forces to use in our corrected
computations. The corrected values (using the computed averages)
can then be stored in the memory allocated for the initial state
(since we're done with it unless we want to re-correct with
another estimate of average forces using the forces we find in
our corrected world), so memory usage only doubles (or maybe as
bad as triples depending on how we're forced to compute averages
of the data). Since we're computing a future world at least
twice for each time step the simulation rate will drop, but the
accuracy improves more than if we halved the time step. If we're
having problems with accuracy, or just want more of it, and can
afford the memory usage and the added complexity of computing
the average forces, then this may be faster than decreasing the
timestep to achieve the same accuracy.

In
summary, what this method does is guess and check the world
state in the future. We take our initial guess at the future
assuming all the forces are constant over the time step, use the
future forces in our first guess of the future to provide us
with a better guess at average forces over the time step, and
use those averages to make a better guess at the future.

If
code is your thing, here's a snippet computing a one-dimension
mass on a spring:

double x = 10.0;

double
v = 0.0;

double k = 1000.0;

double t =
0;

double t_step = 0.001;

double t_max =
10.0;

while ( t < t_max ) {

double a = ((x
* -k) / m);

double xp = (x + v * t_step + 0.5 * a *
t_step * t_step);

double vp = (v + a * t_step);

double
ap = ((xp * -k) / m);

x = x + v * t_step + 0.25*(a +
ap) * t_step * t_step;

v = v + 0.5*(a + ap) *
t_step;

t += t_step;

}