Prerequisites
If you are not familiar with matrix algebra you may like to look at the following pages first:
In the applications section it is important to realise there are different standards and conventions and therefore it is a good idea to familarise yourself with the standards used on this site:
Matrix differentiation
To differentiate a matrix with respect to a variable, say 'x', we individually differentiate each element with respect to 'x'. So if:
[f(x)]= 

then:
[ d f(x) / dx]= 

So to give a more specific example if:
[f(x)]= 

then:
[ d f(x) / dx]= 

So this is quite simple, provided that we can differentiate the elements of a matrix, we can differentiate the whole matrix.
Matrix Differentiation with respect to another Matrix
Since division of one square matrix, with nonzero determinant, by another will give a result (unlike vectors) we can define differentiation with respect to another matrix.
What are the rules of such differentiation? What applications does it have?
Could we use it in this situation? Imagine that two matrices represent the angular position of two gears, can we differentiate one with respect to another to get the ratio of the gears?
Jacobian matrix
There are more complicated types of differentiation, for instance the Jacobian which gives all combinations of the partial differentials of elements of the vector with the other elements.
J (x1,x2...xn) = 

This is related to vector calculus such as grad, div and curl.
Applications
Here we are considering the simple differentiation of a matrix at the top of this page. However, when we start to look at applications it becomes less simple.
For instance, with linear movement we just use v = dx/dt and we treat velocity v as being the same thing as dx/dt but with rotation the equation is more complicated:
[~w]*[R(t)] = [d R(t) / dt]
This is also discussed on the
angularvelocity page and in particular Mark Ioffe was kind enough to send me a proof of this, see this page.
What I want to do is understand the deeper reasons for this extra complexity. I think this involves these factors:
 These are time varying quantities, of course v = dx/dt works for time varying quantities but at least if we have constant velocity (and so constant linear momentum) then dx/dt will be constant. But if [R(t)] represents the orientation of an object rotating at a constant angular velocity (and constant angular momentum) then [d R(t) / dt] will still vary with time but [~w] = [d R(t) / dt]*[R(t)]^1 will not vary with time and therefore is a better representation of angular velocity.
 I think differentiation is related to the addition operation but rotations are combined using matrix multiplication, not addition. When I say "differentiation is related to the addition operation" I mean: when we add a small increment to time we get a small increment to distance, differentiation is the limit when these additions. So is there a mathematical theory that relates small incremental multiplications to conventional differentiation?
Two Dimensional Case
Rotation matrix is: [R]= 

We want the object to be rotating at a constant angular velocity of 'w'. Therefore we replace the angle 'a' with w*t as follows:
Rotation matrix is: [R]= 

So if we differentiate this with respect to time we get:
d[R] / d t= 

So, as already described, this is a time varying quantity even though it is rotating at a constant angular velocity. However we can factor this matrix as follows:

= 

* 

so if we let
[~w]  = 

Then we get:
[d R(t) / dt] = [~w]*[R(t)]
So we now have a nontimevarying matrix to represent angular velocity.
Three Dimensional Case
If
_{a}(t)= [R(t)]_{l}
where:
 _{a}(t) = a point represented by a vector in absolute coordinates which is moving (i.e. is a function of time)
 [R(t)] = a rotation matrix which is a function of time.
 _{l} = a fixed point represented by a vector in the objects local coordinates.
In other words, if we take a fixed point on an object, and transform the object by multiplying it with a rotation matrix, which is a function of time, then we will get a vector which is rotating as defined by the matrix.
If we want to get the velocity of this vector then we need to differentiate the matrix t give,
_{a}(t)= [(t)]_{l}
where:
 _{a}(t) = the velocity of the point in the first equation
 [(t)] = the matrix from the first equation which has now been differentiated.