thank you so much for attending the compu-salon, episode 4. Here's a summary of our discussion.
The next salon will be this Friday, Feb 17, at 4pm. We'll talk about making beautiful and informative plots, with a special focus on the teachings of the statistical-graphics guru, Edward Tufte. As I mentioned last Friday, I would like all of you to bring (and send me in advance) one of your own plots that you're especially proud of, or that you feel needs work. We'll discuss what's great about them, and how they could be improved.
I'll send out a reminder on Friday morning. Until then!
Also at www.vallis.org/salon.
In our fourth meeting, we talked about:
The principles and (broadly) mathematics of the Ising model and Markov Chain Monte Carlo (MCMC) sampling. We won't repeat them here, but among many other places you can look in Mackay's book, Sethna's book, and for a historical perspective of why the Metropolis algorithm should be the Rosenblush algorithm, Gubernatis.
The history and principles of object oriented programming (OOP). Wikipedia has a decent article. On Friday, we used our MCMC code to exemplify four widely recognized (if partly overlapping) principles:
We used encapsulation/abstraction when we defined a class
state to represent a single state of a simple Markov chain, which we built to produce samples distributed as exp( − x4):
class state(object): def __init__(self,x): self.x = x def logp(self): return -0.594875 - self.x**4
Remember the all-important distinction between the class (which defines the behavior of the object) and its instances (of which there can be many, and which actually hold the data). The Python code above defines the constructor
__init__, which is called to create every instance, and which (here) stores the argument
x in the instance variable
self.x; it defines also the method
logp, which uses the value stored in the instance variable
self is a special variable that refers to the class instance; in Python,
self is always the first argument of all class methods. Thus, in the console:
>>> s1 = state(0) >>> s1.x 0 >>> s1.logp() -0.59487500000000004 >>> s2 = state(1) >>> s2.x 0 >>> s2.logp() -1.594875
Next, we again used encapsulation/abstraction when we defined a class
chain to represent a Markov chain. The job, as it were, of a chain is to hold the current state (in
self.current), to step to a new state using the Metropolis rule (with the
step() method), and to repeat the stepping
n times (with the
class chain(object): def __init__(self,init,propose): self.current = init self.propose = propose self.xs = [init] def step(self): new = self.propose(self.current) logpn, logpc = new.logp(), self.current.logp() if ( (logpn > logpc) or (math.log(random.random()) < logpn - logpc) ): self.current = new self.states.append(self.current) def run(self,n): for i in xrange(n): self.step()
step() calls a function
self.propose that we must provide to the
chain constructor. In Python, functions are first-class objects, so they can be stored in a variable, to call later. In our salon, we came up with two different proposals for the stepper:
def randomwalk(s): dx = random.random() - 0.5 return state(s.x + dx) def curtwalk(s): return state(10*random.random() - 5)
We're then ready to use our Markov chain, and perform a first integration:
>>> c = chain(state(0),randomwalk) >>> c.run(10000) >>> N.mean([s.x**2 for s in c.states]) 0.33987152744696092
Note how we used a list comprehension to extract the instance variable
x from every saved state. (We also used
numpy—by the way, did you start your session with the standard incantation "
import numpy as N, matplotlib.pyplot as P"?)
Next, we moved on to the Ising model, and created a class
ising to hold the state (this time an N × N matrix of 1s and -1s), and to compute the energy, magnetization, and of course the probability.
class ising(object): def __init__(self,init): self.x = init def energy(self): return -(N.sum(self.x[:-1,:] * self.x[1:,:]) + N.sum(self.x[:,:-1] * self.x[:,1:]) + N.sum(self.x[0,:] * self.x[-1,:]) + N.sum(self.x[:,0] * self.x[:,-1])) def magnetization(self): return abs(N.sum(self.x)) def logp(self,T): return -self.energy()/T
(For homework, make sure you understand why the matrix expression in
energy() implements the Ising energy for periodic boundary conditions.) Again we need a proposal for the Metropolis stepper. The simplest thing we can do is to flip a single spin at random:
def spinflip(s): newx = s.x.copy() i,j = random.randrange(0,len(newx)), random.randrange(0,len(newx)) newx[i,j] *= -1 return ising(newx)
We're almost ready to go, but we still need to modify the class
chain slightly to take a temperature as a constructor argument, and to use it when calling
logp(). [We can retrofit our p exp( − x4)
state to work with the modified
chain by taking a dummy argument
T (set to
None by default) to
logp().] OK, we can start an Ising chain with a matrix of zeros:
>>> c = chain(init=ising(N.ones((20,20))),propose=spinflip,T=2) >>> c.run(100) >>> c.states[-1].x # see the final state array([[ 1., 1., 1., 1., -1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., -1., 1.], [...and so on...]
In constructing the chain, we could have used positional arguments instead of keyword arguments as above, but the latter are nice for clarity.
Now, notice how the class
chain works just as well with
ising as it did with
state! Indeed, all that a generic state-like class needs to do in order to work with
chain is to define the method
logp(). The fact that
chain ends up calling different code for
logp() depending on the type of
self.current is an example of dynamic dispatch.
It is also an example of subtype polymorphism—that is, multiple state-like objects exist, which can define different methods
logp(). This however would be more evident in a language like C++, where we would have to explicitely define an abstract class (say,
abstractstate) with a virtual method
logp(), and have both
ising inherit from it. The C++
chain class would then hold a pointer to
abstractstate, and calling
logp() on it would select the correct implementation (say,
ising's), from the
abstractstate virtual method table. If you're not familiar with C++ and didn't follow any of that, don't worry, you don't need to. That's why you're learning Python.
Instead, Python uses a much less formal name-based polymorphism where the interface of a class (i.e., the variables and methods that it makes available to the world) are defined implicitly by how the class can be used. This is known also as duck-typing: if it walks, swims, and quacks like a duck, it's a duck. (Or a pa-pap, as my son would say.)
Since it's not practical to store the entire Ising configuration of all the states in a chain, we will again modify
chain slightly, to store only the energy and magnetization. There are many ways we could do this—to be least disruptive to our existing code, we'll give
ising a method
obs() that returns a tuple of energy and magnetization;
chain will then try to call
obs(), and default to saving the state (using a Python exception) if
obs() is not available. See the full code below for details. Thus:
>>> c = chain(init=ising(N.ones((20,20))),propose=spinflip,T=2) >>> c.run(10000) >>> P.plot(N.array(c.states)[:,0]) # plot the energy
This Markov chain mixes (thermalizes) very slowly, because the spin-flip proposals are so gradual; indeed, it's almost unusable for simulations. We can do much better with the Wolff update scheme, which selects a probabilistic cluster of spins with the same orientation, and flips all the spins in the cluster at once. See Sethna (2006) for an explanation of this elegant scheme, and see the full code below for a Python implementation.
The Wolff update does not even require the Metropolis stepper, since the update itself satisfies the MCMC detailed-balance condition. Thus, we need to once again change the logic in
chain; we do so by using object-oriented inheritance to define a class
wolffchain that inherits all the
chain methods, but redefines step():
class wolffchain(chain): def step(self): self.current = wolff(self.current,self.T) self.states.append(self.current) # ...or the slightly more complicated code to save observables...
Let's give it a spin (we'll pass
wolffchain does not need it):
>>> c = wolffchain(init=ising(N.ones((20,20))),propose=None,T=2) >>> c.run(10000) >>> P.plot(N.array(c.states)[:,0])
This mixes much better, and indeed it can be used to plot, say, the expectation value of energy as a function of temperature. For convenience, we define
def runchain(T,n,size=20): c = wolffchain(ising(N.ones((size,size))),propose=None,T=T) c.run(n) return N.mean(N.array(c.states),axis=0) / size**2 # average E,m per spin
>>> Ts = N.linspace(1.5,3.5,30) >>> Es = N.array([runchain(t,1000) for t in Ts]) >>> P.plot(Ts,Es)
Ta-da! We can see the beginnings of a phase transition, which is realized in the limit of lattice size going to infinity.
As usual, let me know if you have questions, or if I left some typos and misprogrammings here and there.
Curt asked, why OOP? Actually, what he asked was, if OOP is so good, why not use it all the time? Well, OOP is about organizing and reusing code (and ideas). If your application is one-shot, and simple enough that you can hold its logic in your brain without abstracting subconcepts into objects, then by all means write it as non-OOP procedural code. This threshold is somewhat language dependent: since OOP requires more boilerplate in C++ than in Python, and even more in Java, you may find that for the same problem you'd happily write a class implementation in Python, but not in other languages. In some languages OOP is just not supported (C, F77, Mathematica), or it's too, too painful (MATLAB).
I could not find a good specific reference for object-oriented programming in Python, but you could start from "Classes" in the official Python tutorial. The rest you'll pick up as you start writing code. Later in this salon we'll discuss some advanced language features (who knows, maybe even metaclass programming), which are explained very well in Pro Python.
import math, random import numpy as N # a simple MC state abstraction for a particular probability class state(object): def __init__(self,x): self.x = x def logp(self,T=None): return -0.594875 - self.x**4 # a Markov chain with a Metropolis-rule stepper class chain(object): def __init__(self,init,propose,T): self.current = init self.propose = propose self.T = T # we'll collect all states (or their observables) in a list try: self.states = [init.obs()] except AttributeError: self.states = [init] def step(self): new = self.propose(self.current) # we work with logarithms in case probabilities get large logpn, logpc = new.logp(self.T), self.current.logp(self.T) # Metropolis rule: if p_new > p_old, always choose the new # if p_new < p_old, choose the new with probability p_new/p_old # otherwise let new = old if ( (logpn > logpc) or (math.log(random.random()) < logpn - logpc) ): self.current = new try: self.states.append(self.current.obs()) except AttributeError: self.states.append(self.current) def run(self,n): for i in xrange(n): self.step() # random-length random-walk proposal def randomwalk(s): dx = random.random() - 0.5 # pick a number between [-0.5,0.5] return state(s.x + dx) # create a new state instance with the new x # flat proposal def curtwalk(s): return state(10*random.random() - 5) # an MC state abstraction for the 2D Ising model class ising(object): def __init__(self,init=None,size=20): self.x = init def energy(self): return -(N.sum(self.x[:-1,:] * self.x[1:,:]) + # note the slicing tricks to multiply to the left and bottom N.sum(self.x[:,:-1] * self.x[:,1:]) + N.sum(self.x[0,:] * self.x[-1,:]) + # we also need wrap-around borders N.sum(self.x[:,0] * self.x[:,-1])) def magnetization(self): return abs(N.sum(self.x)) def logp(self,T): return -self.energy()/T def obs(self): return (self.energy(),self.magnetization()) def spinflip(s): newx = s.x.copy() # copy the current matrix # select two indices at random, and flip the spin i,j = random.randrange(0,len(newx)), random.randrange(0,len(newx)) newx[i,j] *= -1 # call the ising constructor to create a new state from the matrix return ising(newx) # return a list of four nearest-neighbor positions to pos (as tuples) # (needed by wolff) def neighbors(pos,size): return [((pos + D) % size, (pos + D) % size) for D in ((1,0),(-1,0),(0,1),(0,-1))] def wolff(current,T): newx = current.x.copy() size = len(newx) # flipping probability needed for detailed balance in the Wolff scheme p = 1.0 - math.exp(-2.0/T) start = random.randrange(0,size),random.randrange(0,size) startspin = newx[start] # pick a random position to begin, note the spin toflip = [start] # start a FIFO queue of locations to flip while toflip: # if there are spins in our queue if newx[toflip] == startspin: newx[toflip] *= -1 # flip the first one if it's still the same color for pos in neighbors(toflip,size): # add neighbors of the original color with probability p if newx[pos] == startspin and random.random() < p: toflip.append(pos) # else: the spin was flipped already del toflip # then remove it from the queue return ising(newx) class wolffchain(chain): def step(self): self.current = wolff(self.current,self.T) try: self.states.append(self.current.obs()) except AttributeError: self.states.append(self.current) def runchain(T,n,size=20): c = wolffchain(ising(N.ones((size,size))),propose=None,T=T) # run n steps of the chain c.run(n) # return the average energy and magnetization per spin return N.mean(N.array(c.states),axis=0) / size**2