Lecture Notes 7, Wiener Processes and Itô's Lemma K Foster, CCNY, Spring 2010 

Learning Outcomes (not directly from CFA exam for this one)
Students will be able to:
§ see how we use Itô's Lemma and the underlying stochastic calculus to set up the BlackScholesMerton option pricing result;
Basic Calculus
Recall from basic calculus the definition of the derivative: examine the limit,
and if both the lefthandside (lhs) and righthandside (rhs) limits are equal, then we call this limit the derivative,
.
Rewrite this to show that, for very small values of h,
, or, in a switch of notation, replacing h with Δx,
.
This we can interpret along the lines of the Taylor Theorem, that for small values of x, the term xf'(x) is a good approximation to f.
The Taylor Theorem says that for a differentiable function, we can approximate local changes with derivative(s),
Which relates to the interpretation of the derivative above as basically stating that (since of course 1! = 1) the first two terms are a 'pretty good' approximation to the value of the function at f. We can usually think of the x^{2} and x^{3} and higherorder terms as going fast towards zero, so fast that they are of negligible importance.
Recall/Learn from Advanced Calc
If we have a function of two variables, say a production function that output is made by two inputs, capital and labor, then we would write this as
and then define the marginal productivity of each input separately, as the amount by which output would increase if one input were increased but the other input held constant. This is notated mathematically as . If we had some general polynomial function to represent output, say that (a funnylooking production function, I know, but the math works out easy). Then and .
The total differential, dy, is defined as equal to the sum of the partials,
.
This has an easy economic interpretation: if L goes up by a little bit, ΔL, how much does output change? By MP_{L}*ΔL. If K goes up by ΔK, then output rises by MP_{K}*ΔK. If both L and K change by small amounts then we get the total differential above.
Now since we are suppressing the functional notation, we could write the total differential, more cumbersomely but equivalently, as . This is useful because, to find the second derivative, we need to find how the partial derivative with respect to L changes as K changes (i.e. how the marginal productivity of labor is affected by having more capital) as well as vice versa (how the marginal productivity of capital changes with having more labor i.e. how the partial derivative with respect to K changes with L. So it is natural to consider the second partial derivative just like a regular second derivative.
To put this into a bit more generality, given G(x,y) then .
What if, now, we have a chain of functions, so the function, G( ), is a function of time, t, and another function, X(t)? In finance it is natural to think of G( ) as a payoff function of a derivative that depends on the time (whether it is at expiration) and the stock price (itself a function of time). But for now just allow G(t,X(t)) a general existence. Then we can express the change in the value of the function as
where since X is itself a function of time we can write
, or, in full functional notation,
,
which makes clearer that we first take the partial with respect to the first argument, then take the partial with respect to the second argument (which is itself a function so we use the Chain Rule).
So we've got some fearsomelooking math notation but we haven't done anything more sophisticated than using the Chain Rule.
Next we review some basic statistics that you learned.
Basic Stats
Recall the definition of the sample variance, that
which estimates the true population variance, which is
And note that if the mean is zero, µ=0, then the variance is equal to the expected value of the squared random variable;
.
In statistics it is often convenient to use a normal distribution, the bellshaped distribution that arises in many circumstances. It is useful because the (properly scaled) mean of independent random draws of many other statistical distributions will tend toward a normal distribution this is the Central Limit Theorem.
Some basic facts and notation: a normal distribution with mean µ and standard deviation σ is denoted N(µ,σ). (The variance is the square of the standard deviation, σ^{2}.) The Standard Normal distribution is when µ=0 and σ=1; its probability density function (pdf) is denoted pdf_{N}(x); the cumulative density function (CDF) is cdf_{N}(x) or sometimes Nor(x). This is a graph of the pdf (the height at any point):
and this is the CDF:
One of the basic properties of the normal distribution is that, if X is distributed normally with mean µ and standard deviation σ, then Y = A + bX is also distributed normally, with mean (A + µ) and standard deviation bσ. We will use this particularly when we "standardize" a sample: by subtracting its mean and dividing by its standard deviation, the result should be distributed with mean zero and standard deviation 1.
Oppositely, if we are
creating random variables with a standard deviation, we can take random numbers
with a N(0,1) distribution, multiply by the desired standard deviation, and add
the desired mean, to get normal random numbers with any mean or standard
deviation. In Excel, you can create
normally distributed random numbers by using the RAND() function to generate uniform random numbers on [0,1], then NORMSINV(
In Matlab, of course, we
can use x
= random('
Now for the Main Act
I'm going to do some (more)
rather sloppy math here, in the interests of communicating the concept rather
than all of the details. We'll go
through
In simple modeling of stock prices we had a return in two parts: a general drift and an error. We can represent the return on a security as , where the term, aΔt, is the drift and bW is an error term. Focus on the error, W.
For a first approximation we might use the normal distribution to model this error. We want it to have a zero mean or else there would be arbitrage opportunities. Then we want to think of how our uncertainty about a stock value changes with the time horizon; it seems reasonable that as a future date gets farther off, our uncertainty would increase. The range of variation that I expect a stock to be in, after one year, is much larger than the range I'd expect after just a day.
This is equivalent to assuming that ΔW is normally distributed with mean zero and variance equal to the distance between time increments, so . The W notation is used since these are called Wiener processes: it is continuous everywhere but nowhere differentiable. It is commonly used in many physical sciences.
Side Note: The basic property, that the distribution is normal whatever the time interval, is what makes the normal distribution {and related functions, called Lévy distributions} special. Most distributions would not have this property so daily changes could have different distributions than weekly, monthly, quarterly, yearly, or whatever!
We can keep doing this on and on, finding that the variance at a point halfway there is half of the original variance like Zeno's famous paradox of Achilles and the tortoise.
With some mathematical hocuspocus we can prove that this converges to a Wiener process. This could look something like this:
Recall from calculus the idea that some functions are not differentiable in places they take a turn that is so sharp that, if we were to approximate the slope of the function coming at it from right or left, we would get very different answers. The function, , is an example: at zero the lefthand derivative is 1; the righthand derivative is 1. It is not differentiable at zero it turns so sharply that it cannot be well approximated by local values. But it is continuous it can be continuous even if it is not differentiable.
Now suppose I had a function that was everywhere continuous but nowhere differentiable at every point it turns so sharply as to be unpredictable given past values. Various such functions have been derived by mathematicians, who call it a Wiener process (it generates Brownian motion). (When Einstein visited CCNY in 1905 he discussed his paper using Brownian motion to explain the movements of tiny particles in water, that are randomly bumped around by water molecules.) This function has many interesting properties including an important link with the Normal distribution. The Normal distribution gives just the right degree of variation to allow continuity other distributions would not be continuous or would have infinite variance.
Note also that a Wiener process has geometric form that is independent of scale or orientation a Wiener process showing each day in the year cannot be distinguished from a Wiener process showing each minute in another time frame. As we noted above, price changes for any time interval are normal, whether the interval is minutely, daily, yearly, or whatever. These are fractals, curious beasts described by mathematicians such as Mandelbrot, because normal variables added together are still normal. (You can read Mandelbrot's 1963 paper in the Journal of Business, which you can download from JStor he argues that Wiener processes are unrealistic for modeling financial returns and proposes further generalizations.)
These Wiener processes have
some odd properties. Return to the
notion of approximating a smooth process by a
or,
where we can approximate G as its first derivative, times the size of the step, x. And the existence of the derivative means that, in a sense, the second order and higher terms are very small small enough that the first derivative is the limit as the size of the step goes to zero.
As we noted, Wiener processes are not differentiable. This is largely because of the secondorder term: the random variation is "too big". But we can try to figure out a way to take account of the secondorder term.
We argued intuitively that
the variance of the Wiener process is ,
that the variation is proportional to the time step. In our recollection of basic statistics we
said that the variance of any meanzero random variable is ;
this implies that . Looking at the equation above, showing the
Being a bit more careful, we can start from and then just idly find . Again the variance of W is of a higher order, , that the variance of the Wiener process is given as the size of the timestep. So . The middle tern does, however, drop out since Δt and ΔW are each very small, and multiplying two very small things (which are uncorrelated) by each other will give an even smaller result. The ΔW^{2} term doesn't drop the same way because ΔW is correlated with itself (its variance).
So returning to the
,
or, switching from
.
Now substitute our basic definition of , as well as our deduction above that , so
.
This is a genuinely weird result. Stop a moment and consider it: the variance, b, of the underlying process, suddenly emerges to have a direct influence on the level of G (since it multiplies by Δt).
This is the beginning of the larger result, known as Itô's Lemma.
We might wish to analyze a more general function, say . This is a good representation of the payoff to a derivative, since it depends on t (time to expiration) as well as the value of the underlying security, x(t). We'll move to a continuous time representation of x and write (with "d" instead of Δ).
In this case the first order partial derivatives are . We can omit the second order terms from the first arguments since these are of order . But the second order terms from the second argument is where, as we previously found, so the secondorder term is . So the total derivative of G, dG, is
and then we substitute in the term to get
.
And that ugly bit of mathematics is Itô's Lemma. It shows a general formula for how the first order effect (the expected value) is affected by the secondorder terms not just the variance but also the second "derivative" of the function of the Wiener process.
To take the most ordinary instance, take , so the x are price changes and G(x) gives the value of the stock. Then and ; of course . Substitute these into Itô's Lemma to find
Since this beast that we're calling is the stock price, , we can rewrite the equation as
,
which shows that the mean of the stock price return is determined in part by the variance.
One of the other basic properties of Wiener processes is that they have a Markov property (or representation). This means that their history is irrelevant: the current value gives the best forecast. This coincides with our model of weak stock market efficiency, the theory that current prices embody all possible information. If there were useful information to be derived from the stock's history, then someone else would have likely already found it and used it.
Formally, z is a Wiener process if:
So if we define N = T/t, then z_{T} =z_{0} + .
A generalized Wiener process adds a drift rate that is a known function and a variance function, so now x is a generalized Wiener process, dependent on z, a standard Wiener process, if dx = a dt + b dz, where a and b are constants..
An Itô process is a further generalization, where now a and b are given functions of x and t, so ; for short discrete changes we assume that a and b are constant over some range so that approximately , where you will note that we've inserted the identity that .
We often model a stock's percentage return as a generalized Wiener or Itô process, dS = µSdt + σSdz, or dS/S = µdt + σdz; in discrete time this is S/S = µt + σε .
Itô's Lemma
If we have a function, G(x), where x is an Itô process, , then Itô's Lemma just tells us that finding G(x) is more complicated:
.
This requires a good bit of calculus to get your arms around. If you recall the math, good. If not, just remember the formula and keep this around so that after you take some more math classes you can come back to Mr. Itô and his Lemma.
Recall from calculus how to find a Taylor expansion, for some function G(x), which is near G(x_{0}) so we want to find G = G(x) G(x_{0}):
; that G is differentiable means that the higherorder terms (x^{2} and up) are tiny so we have a good approximation.
Next, if G were a function of both x and y, G(x,y), then the total derivative is:
Normally, a function that is differentiable has the higherorder powers (x^{2} and y^{2} and xy) again die off at a high rate so we would be left with just the terms involving x and y.
Now if this function, G(x), has an x that is an Itô process, , or, if we drop some of the extra notation, that ; in discrete time this is .
Itô's Lemma tells us that finding G(x) is more complicated, that
. It might be easier to remember if we break it into the relevant parts: if we did not have a stochastic component, then if we differentiated G(x(t,z),t) then we'd have ; then Itô's Lemma just adds in a term, .
How can we show this?
Normally we could drop the
secondorder terms because we're letting the dx and dt terms get
infinitesimally small so that (x)^{2}
and (t)^{2}
get even tinier. But in this case, so . How is the last term different from the first
two? It does not have t
to any higher power (squared or to the 1.5 power). So it doesn't drop out we have to incorporate it into the
.
From here we just simplify. If then its variance is one so the expected value of ε^{2} =1 and that term drops out. Also we substitute in and so
, or, rearranging,
That is the result that you were promised at the beginning of this section. Congratulations if you made it this far!