Caltech Astrostats: 2D posterior plot is easy ---CA3

Debugging is annoying and we frequently run into a lot of problems when plotting 2D posterior contour. Believe me, you are not alone. There is a simple code I used for CA3 problem4. It is not optimized but it shows some standard steps to do the 2D posterior plots which I find useful.

# step1: read your data and do some basic check (visualization is important)
# step2: prepare a 2D -parameter mesh grid, which can be done by meshgrid function in numpy
# step3: Given the 2D parameter grid, we can calculate the probability of each datum at the every grid coordinate, i.e. , a 2D probability for each data point. It is usually convenient to do it in log scale.
# step4: sum the log probability of all data points, then we have the joint 2D log probability for
the whole data set.
# step5: find the maximum joint log probability and subtract it. We do this to avoid numerical outflow problem since unnormalized joint probability is usually a very small value.
# step6: exp(log prob) and plot a contour

The example of CA3_problem4

from numpy import *

import math

import matplotlib

import matplotlib.pyplot as plt

import asciitable

import scipy

# read the data

data      = asciitable.read('fv05_vf05_data.txt')

detection = data['comp'] #detection or not 1/0

feh       = data['metallicity'] #metallicity

# check data

n         = detection.size    #the total number of data point

print feh.min(),feh.max()     # metallicity range in the data

#choose the proper range of alpha, f>0 --> alpha>0

alpha_grid= arange(0.01,0.1,0.001)

beta_grid = arange(.0,3.0,0.01)

# generate a 2D meshgrid for alpha and beta

(alpha,beta) = meshgrid(alpha_grid,beta_grid)

logpdf = zeros((beta_grid.size,alpha_grid.size))

# loop through each data point

for i in range(n):

    f = alpha*10**(beta*feh[i])

    f[(f >1)] = None 

    #set f to none if f>1, here we don't need to check f>0 

    # because we already set alpha>0

    # sum the log(prob) of the all data

    logpdf += log(detection[i]*f+(1-detection[i])*(1.-f))

# set the log(prob) at forbidden (alpha,beta) value to -999    

whereAreNaNs = isnan(logpdf)

logpdf[whereAreNaNs] = -999

# find the max logpdf

print logpdf.max()

# find the peak location

ind = where(logpdf == logpdf.max())

print 'the peak is at alpha=%.4f,beta=%.4f' %(alpha_grid[ind[1]],beta_grid[ind[0]])  

logpdf = logpdf-logpdf.max()

# pdf is the normal probability function

pdf = exp(logpdf)

# 2D plot

plt.figure()
CS = plt.contour(alpha,beta,pdf)

Caltech Astrostats

Wednesday, May 16, 2012

2D posterior plot is easy ---CA3_problem4

No comments:

Post a Comment