These RV measurements are in the same format as last week's HD102964 data set (t, v, sigma_v).

10365.979236 -5.4720 0.9872

10715.080012 -7.4086 1.0981

10786.759306 0.4987 1.2969

10806.886343 -5.0791 0.9564

11010.115556 3.5447 1.0068

11012.085440 4.8886 0.9584

11013.103947 3.1676 1.1267

11014.105914 4.3955 1.0783

11070.992361 1.8550 1.3133

11170.793657 4.4971 1.1046

11374.123206 0.0000 1.0404

11412.084051 4.0750 1.2597

11585.736539 16.8490 1.0630

11757.136620 -4.7007 1.0475

11883.801736 2.1463 1.3573

12098.097419 -10.4014 1.1261

12134.022928 -15.5328 1.2441

12162.926852 -5.7099 1.2710

12188.010926 -5.7224 1.1966

12218.947315 -12.4545 1.2057

12235.761655 -5.3930 1.2138

12489.047558 -12.2946 1.0018

12516.010463 -15.1926 1.2005

12535.910926 -2.3131 1.3004

12572.946597 -5.6120 1.3796

12600.927569 -8.5339 1.3608

12651.762963 -6.4725 1.0555

12833.093715 -0.4214 1.1765

12834.127720 5.3296 1.2034

12835.121528 0.2185 1.3992

12836.075405 0.9086 1.1630

12850.087384 -2.7960 1.0992

12851.131667 -1.2862 1.1301

12854.031620 0.5737 1.2861

12855.114792 -0.6867 1.3715

12856.061181 -1.0921 1.3094

12897.920671 0.0466 1.2069

12899.035324 -5.1169 1.2442

12924.926863 5.1777 1.2645

12926.058322 1.1283 1.2763

12987.739282 12.2442 1.3284

12988.761852 7.2284 1.2760

12989.781250 5.5562 1.3132

13014.741790 0.9943 0.8870

13015.745883 2.7821 0.7728

13016.749759 6.5555 0.7824

13017.765800 0.5844 0.7813

13072.716047 9.2352 1.0131

13180.129491 8.6986 1.1281

13181.124873 13.2785 1.3097

13182.120243 7.6246 1.1346

13190.126876 9.3208 0.6338

13191.126546 9.5378 0.7480

13196.114144 -2.8805 1.2701

13197.125305 -5.7718 0.8569

13207.117720 4.3347 1.3446

13208.123322 8.6826 1.1997

13238.075000 0.4488 1.1177

13239.137743 4.7305 1.0623

13239.983356 3.7248 1.0509

13241.043070 6.8523 0.7034

13301.800231 5.8442 1.1004

13302.870058 11.5434 1.0594

13303.894282 1.6920 0.8718

13338.766794 10.3218 0.9728

13339.911921 5.3337 1.0051

13368.839549 1.9201 0.9236

13369.760250 2.9115 0.6046

13397.727720 -0.5044 0.9282

13398.748762 7.9218 0.8294

13399.719965 7.9981 0.8511

13400.722338 4.0888 0.9303

13425.749826 1.5571 0.9956

13426.711343 1.1308 0.9198

13427.763414 1.7341 0.9986

13551.131655 0.1543 0.9205

13552.133299 -1.7902 0.9850

13696.900440 -5.7341 1.0210

13723.843958 -4.9887 1.0141

13746.743835 1.4755 0.5480

13747.803715 0.9159 0.9375

13748.728171 -2.2966 0.9150

13749.734456 3.1935 0.8821

13750.716424 1.9810 0.9588

13751.696863 2.5223 0.9091

13752.695995 2.8298 1.0102

13753.694321 2.7341 0.6803

13775.730961 -8.2307 1.0315

13776.722211 -3.9923 0.9291

13926.130456 -9.5344 0.5636

13927.109433 -11.1021 1.0593

13928.116435 -8.9531 0.9818

13933.092130 -5.0851 0.9839

13962.120150 -12.5511 0.9665

13981.981065 -8.1267 1.0270

13983.031655 -4.7627 1.1662

13984.031319 -13.4317 1.0740

13984.981215 -7.9433 0.9975

14023.985822 -8.4543 1.0863

14083.846400 -8.5211 1.1254

14084.801794 -8.2853 1.1667

14085.926840 -10.7708 1.1112

14129.728727 -9.8557 1.1174

14130.808565 -6.8924 1.0496

14131.735139 -9.5523 1.1019

14138.726123 -19.2455 1.0797

14307.135394 -19.3035 1.1076

14314.106944 -10.6002 1.1014

14337.109931 -18.1683 1.1961

14343.989815 -14.3236 1.1816

14396.843993 -8.1080 1.1140

14428.938345 -9.9274 0.6591

14639.129965 -10.0765 1.1747

14675.048241 -16.5316 1.1695

14689.013530 -9.5340 1.2468

14717.956979 -18.2615 1.3374

14725.896910 -6.4570 1.3795

14727.012153 -11.5006 1.2117

14727.955961 -2.4502 1.1250

14838.800382 -0.6436 0.9883

15015.121124 -3.7773 1.0546

15049.114495 -2.2694 1.2116

15133.998578 -7.5107 1.1313

15171.896575 -0.4995 1.2235

15189.809291 -3.7323 1.2005

15196.784565 -1.1088 1.0145

15231.702634 1.9133 1.0408

15255.709429 -0.0267 1.1699

15260.711648 10.2724 1.2097

15406.115969 -2.8208 1.0969

15407.102724 -1.6299 1.0710

15414.080465 -0.5420 1.1696

15426.085724 6.4654 1.0386

15435.115410 3.9615 1.1658

15439.122233 -3.5856 1.1219

15465.062030 6.6551 1.2756

15487.037152 0.0895 1.1080

## Monday, November 22, 2010

## Wednesday, November 17, 2010

## Monday, November 15, 2010

### AMOEBA implementation

Here's my code. IDL users, feel free to use this as-is. MATLAB users, feel free to translate it. The point of this assignment is to get the Bayesian periodogram working, not to fiddle with AMOEBA. However, it's a handy tool to have in your belt, so I encourage you to make sure you understand this example of its use.

;===================SINEFIT.PRO======================

function sineorbit, x, per, par

return, par[0] * sin(2*!pi/per * x + par[1]) + par[2]

end

function sinefunc, par

common sinestruct, struct

ymod = sineorbit(struct.x, struct.per, par)

;;; MaxL is just -chi^2 / 2 without the constant. Just use chi as

;;; statistic to be minimized

chi = total((struct.y-ymod)^2/(2.*struct.err)^2)

return, chi

end

function sinefit, x, y, err, period

common sinestruct, struct

struct = {x:x, y:y, err:err, per:period} ;;; create "control" structure

p0 = [(max(y)-min(y))/2., 0., median(y)] ;;; Initital guesses based on data

;;; The scale is the fractional moves in each parameter that AMOEBA will

;;; make while oozing around in maxL (chi^2) space.

scale = [p0[0]*0.05, !pi*0.1, p0[0]*0.05]

;;; Call amoeba and stop after 5% = 0.05 decrease in chi^2 is reached

pbest = amoeba(0.05, function_name='sinefunc', p0=p0, scale=scale)

;;; Calculate best-fitting orbit

ymod = sineorbit(x, period, pbest)

chi = total((struct.y-ymod)^2/(2.*struct.err)^2)

maxl = - total(0.5*alog(2.*!dpi*err^2)) - chi/2.

return, maxl

end

;===================SINEFIT.PRO======================

function sineorbit, x, per, par

return, par[0] * sin(2*!pi/per * x + par[1]) + par[2]

end

function sinefunc, par

common sinestruct, struct

ymod = sineorbit(struct.x, struct.per, par)

;;; MaxL is just -chi^2 / 2 without the constant. Just use chi as

;;; statistic to be minimized

chi = total((struct.y-ymod)^2/(2.*struct.err)^2)

return, chi

end

function sinefit, x, y, err, period

common sinestruct, struct

struct = {x:x, y:y, err:err, per:period} ;;; create "control" structure

p0 = [(max(y)-min(y))/2., 0., median(y)] ;;; Initital guesses based on data

;;; The scale is the fractional moves in each parameter that AMOEBA will

;;; make while oozing around in maxL (chi^2) space.

scale = [p0[0]*0.05, !pi*0.1, p0[0]*0.05]

;;; Call amoeba and stop after 5% = 0.05 decrease in chi^2 is reached

pbest = amoeba(0.05, function_name='sinefunc', p0=p0, scale=scale)

;;; Calculate best-fitting orbit

ymod = sineorbit(x, period, pbest)

chi = total((struct.y-ymod)^2/(2.*struct.err)^2)

maxl = - total(0.5*alog(2.*!dpi*err^2)) - chi/2.

return, maxl

end

### We still have class!

We still have 3 good weeks of class, so let's keep learning even while you work on your end-of-term projects. We'll have in-class activities, but optional homework (only do as much as you need to learn).

## Sunday, November 14, 2010

### Class Activity 8: HD102956 data set

This week we will examine the radial velocity time series of the 1.8 solar-mass star HD102956. The measurements were made using the HIgh-Resolution Echelle Spectrometer, which is mounted on the Keck 1 telescope. The first column is the Julian Day number minus 2.44e6. The second column is the relative velocity of the star in meters per second. The third column is the measurement uncertainty, which does not account for errors in the velocity related to the star moving around (velocity "jitter"). The class activity worksheet (in class Monday) will describe how to analyze these data.

14216.805150 21.5134 1.1715

14429.150243 15.7103 1.4097

14866.114214 47.6038 1.2837

14986.851118 -48.4759 1.2909

15014.771611 69.0507 1.0646

15016.870291 -38.8699 1.1213

15284.916557 -69.2668 1.3374

15285.992750 -4.1502 1.2532

15313.914745 53.7661 1.2567

15314.828012 0.0000 1.1595

15343.795192 -56.4066 1.1648

15344.885379 14.0928 1.3477

15350.785608 -30.2992 1.2856

15351.880460 44.4481 1.2105

15372.803796 41.1830 1.1439

15373.796131 -32.2805 1.2237

15374.752750 -88.3723 1.0914

15376.783324 -18.5026 1.1120

15377.812002 41.1884 1.1311

15378.748741 60.3889 1.1266

15379.786957 10.0510 1.1149

15380.804516 -60.2914 1.1533

15400.780895 -67.3449 1.3536

14216.805150 21.5134 1.1715

14429.150243 15.7103 1.4097

14866.114214 47.6038 1.2837

14986.851118 -48.4759 1.2909

15014.771611 69.0507 1.0646

15016.870291 -38.8699 1.1213

15284.916557 -69.2668 1.3374

15285.992750 -4.1502 1.2532

15313.914745 53.7661 1.2567

15314.828012 0.0000 1.1595

15343.795192 -56.4066 1.1648

15344.885379 14.0928 1.3477

15350.785608 -30.2992 1.2856

15351.880460 44.4481 1.2105

15372.803796 41.1830 1.1439

15373.796131 -32.2805 1.2237

15374.752750 -88.3723 1.0914

15376.783324 -18.5026 1.1120

15377.812002 41.1884 1.1311

15378.748741 60.3889 1.1266

15379.786957 10.0510 1.1149

15380.804516 -60.2914 1.1533

15400.780895 -67.3449 1.3536

## Thursday, November 11, 2010

### No Final Exam, but...

Clear your calendars for Wednesday, December 8 for the annual Ay117 Statistics in Astronomy Conference. This year the conference is in sunny Pasadena, CA, at the world-famous Cahill Center for Astronomy & Astrophysics.

Conference registration is free, yet mandatory for Ay117 students and TAs.

We will have seven 10-minute oral presentations (+5 minutes for Q&A), and a poster session. Sign up now for a poster or oral presentation slot (posters are graded much more harshly than oral presentations). Abstract submission open through November 17.

The session chair is Professor John Johnson who will give a talk entitled "Bayesian Statistics: There's no magic." Contact him via email for abstract submission (plain text only).

Conference registration is free, yet mandatory for Ay117 students and TAs.

We will have seven 10-minute oral presentations (+5 minutes for Q&A), and a poster session. Sign up now for a poster or oral presentation slot (posters are graded much more harshly than oral presentations). Abstract submission open through November 17.

The session chair is Professor John Johnson who will give a talk entitled "Bayesian Statistics: There's no magic." Contact him via email for abstract submission (plain text only).

### Class Activity 7 finally posted

Nothing has changed since it was assigned in class. Here's the official class activity worksheet.

http://www.astro.caltech.edu/~johnjohn/astrostats/homework/CA7.pdf

http://www.astro.caltech.edu/~johnjohn/astrostats/homework/CA7.pdf

### More reading materials on nested sampling

http://www.inference.phy.cam.ac.uk/bayesys/box/nested.pdf'

Also, check out David MacKay's website:

http://www.inference.phy.cam.ac.uk/bayesys/

Also, check out David MacKay's website:

http://www.inference.phy.cam.ac.uk/bayesys/

## Sunday, November 7, 2010

### Good, but not required, reading

Andrew Gelman's discussion of Harold Jeffreys' Theory of Probability:

http://www.stat.columbia.edu/~gelman/research/published/jeffreys.pdf

http://www.stat.columbia.edu/~gelman/research/published/jeffreys.pdf

## Friday, November 5, 2010

### On Jeffreys priors: scale vs. location

I have to admit that even after reviewing many of your write-ups, I remained a bit uncertain about when I should and should not use a Jeffreys prior for a parameter (i.e. p(x) ~ 1/x). It is clear to me that the determination of the parameters describing e.g. a normal distribution involves a scale parameter sigma and a location parameter mu, and the uninformed prior for the scale parameter is p(sigma) ~ 1/sigma, and for the location parameter it's a uniform prior p(mu) ~ 1.

But what happens when mu itself is parametrized, such as mu(x; A,B) = Ax + B? Is A a scale parameter while B is a location parameter? Before class Wednesday, I would have answered yes. After class, I was seriously doubting myself.

Unfortunately, after several hours of paging through books and searching the web I couldn't find a definitive answer to this question. The difficulty is that all of the examples of Jeffreys priors I found use a normal distribution as an example, or some other simple distribution (Poisson or binomial). But I could find precisely nothing about the similarly simple and far more ubiquitous problem of an uninformative prior on the slope of a linear fit! What's up with that? So I decided to do what I should have done at first: query my intuition rather than statistics papers.

What is it that we're trying to do with a Jeffreys prior? We are simply trying to properly encapsulate the notion that we have no prior knowledge of a parameter. Let's work with the example of a time series of flux values with normal errors, for which we suspect the flux level is a linear function of time:

F(t; A, B) = A + Bt.

Is a uniform prior appropriate for the slope B? A uniform prior would say that the slope is equally likely to lie between the interval 1 and 2 as it would between 1,000,000 and 1,000,001. Hmmm, this doesn't feel right. The former increase in the slope is a 100% change, while the latter is an increase of only 1 part in a million. By making the statement p(B) ~ const, we'd be saying that a slope between 1,000,001 and 1,000,003 is twice as likely as a slope between 1 and 2! If we were to make this statement, then we'd clearly have some pretty strong prior information. We'd be far from uninformed.

So in the case of a slope, I'd argue that we have a clear-cut reason to rely on a Jeffreys prior of the form p(B) ~ 1/B, which is the same as saying p(lnB) ~ const., or every log interval is equally likely. This feels much better.

Keep in mind that when we talk about prior information, we are talking about knowledge we had about the problem at hand before we were handed the data. For the case of a flux level that is linear in time, what do we know about the offset A? Well, we certainly do not know its numerical value without peeking at the data. But we do know that the shape of the function A + Bt is going to look the same for any choice of the offset, A. The offset can be 1 or 1000 and it won't change the fact that we'll end end up with B more flux in each unit time interval.

Thus, a uniform prior is appropriate for the offset parameter A: the offset interval 1 to 2 is just as likely as the interval 1,000,000 to 1,000,001. The offset is a location parameter, and location parameters call for a uniform prior.

Another way to think about all this is to consider two choices of units. If I change the y-axis from ergs/s/cm^2 to horsepower per acre, the offset just translates from one numerical value to another. However, for a fixed offset A, the units are crucial for a slope value B. If I tell you that the slope is B=1, it matters a lot whether we are talking about deaths per year or deaths per second! As a result, before we have the data sitting on our hard drive we certainly don't want to pick a prior for the slope (rate) that expresses a preference in one choice of units over another.

Using this reasoning p(A) ~ const. and p(B) ~ 1/B feel about right to me for the simple case of a linear fit with normally-distributed errors, or for any other model parametrization for that matter (see Gregory 2005 for the case of a Keplerian orbit, in which the Doppler amplitude K is given a Jeffreys' prior while the phase information is uniform). I'm not much interested in the square root of the determinant of the Fischer information. My intuition tells me that location parameters get uniform priors, and scale parameters (in which the units on the x or y axis matter) get p(B) ~ 1/B priors.

But don't take my word for it, query your own intuition.

But what happens when mu itself is parametrized, such as mu(x; A,B) = Ax + B? Is A a scale parameter while B is a location parameter? Before class Wednesday, I would have answered yes. After class, I was seriously doubting myself.

Unfortunately, after several hours of paging through books and searching the web I couldn't find a definitive answer to this question. The difficulty is that all of the examples of Jeffreys priors I found use a normal distribution as an example, or some other simple distribution (Poisson or binomial). But I could find precisely nothing about the similarly simple and far more ubiquitous problem of an uninformative prior on the slope of a linear fit! What's up with that? So I decided to do what I should have done at first: query my intuition rather than statistics papers.

What is it that we're trying to do with a Jeffreys prior? We are simply trying to properly encapsulate the notion that we have no prior knowledge of a parameter. Let's work with the example of a time series of flux values with normal errors, for which we suspect the flux level is a linear function of time:

F(t; A, B) = A + Bt.

Is a uniform prior appropriate for the slope B? A uniform prior would say that the slope is equally likely to lie between the interval 1 and 2 as it would between 1,000,000 and 1,000,001. Hmmm, this doesn't feel right. The former increase in the slope is a 100% change, while the latter is an increase of only 1 part in a million. By making the statement p(B) ~ const, we'd be saying that a slope between 1,000,001 and 1,000,003 is twice as likely as a slope between 1 and 2! If we were to make this statement, then we'd clearly have some pretty strong prior information. We'd be far from uninformed.

So in the case of a slope, I'd argue that we have a clear-cut reason to rely on a Jeffreys prior of the form p(B) ~ 1/B, which is the same as saying p(lnB) ~ const., or every log interval is equally likely. This feels much better.

Keep in mind that when we talk about prior information, we are talking about knowledge we had about the problem at hand before we were handed the data. For the case of a flux level that is linear in time, what do we know about the offset A? Well, we certainly do not know its numerical value without peeking at the data. But we do know that the shape of the function A + Bt is going to look the same for any choice of the offset, A. The offset can be 1 or 1000 and it won't change the fact that we'll end end up with B more flux in each unit time interval.

Thus, a uniform prior is appropriate for the offset parameter A: the offset interval 1 to 2 is just as likely as the interval 1,000,000 to 1,000,001. The offset is a location parameter, and location parameters call for a uniform prior.

Another way to think about all this is to consider two choices of units. If I change the y-axis from ergs/s/cm^2 to horsepower per acre, the offset just translates from one numerical value to another. However, for a fixed offset A, the units are crucial for a slope value B. If I tell you that the slope is B=1, it matters a lot whether we are talking about deaths per year or deaths per second! As a result, before we have the data sitting on our hard drive we certainly don't want to pick a prior for the slope (rate) that expresses a preference in one choice of units over another.

Using this reasoning p(A) ~ const. and p(B) ~ 1/B feel about right to me for the simple case of a linear fit with normally-distributed errors, or for any other model parametrization for that matter (see Gregory 2005 for the case of a Keplerian orbit, in which the Doppler amplitude K is given a Jeffreys' prior while the phase information is uniform). I'm not much interested in the square root of the determinant of the Fischer information. My intuition tells me that location parameters get uniform priors, and scale parameters (in which the units on the x or y axis matter) get p(B) ~ 1/B priors.

But don't take my word for it, query your own intuition.

## Thursday, November 4, 2010

## Wednesday, November 3, 2010

### Tuning the jump distribution

I mentioned in class that the optimal acceptance rate for MCMC jump proposals is 30-40% based on what others had recommended to me when teaching me the Way of the Markov Chain. It turns out, however, that for higher-order problems the optimal rate is 0.234 (Corolary 1.2 of Roberts, Gelman and Gilks 1997). If you want to know exactly why this is so, you should read the full paper. It's full of lemma-ny goodness! According to the abstract:

Supposedly ~50% is optimal for problems with 1 or 2 parameters (Section 4.2 of Gregory 2005). But if you have only 1 or 2 parameters, I don't know why you'd want to do MCMC, unless there is a nonanalytic likelihood to evaluate through a simulation or something. Gregory cites Roberts et al. above, but I didn't see anything about 50% for simpler problems. If you decide to read Roberts et al., let me know if I missed the note about 50%.

That said, 30-40% has treated me well over the years. Let's say you should aim for 25-30% acceptance of your jump proposals for the full line profile fit (Problem 3 of CA6).

The limiting diffusion approximation admits a straightforward efﬁ-No one breaks it down for the layperson like a statistician!

ciency maximization problem, and the resulting asymptotically optimal

policy is related to the asymptotic acceptance rate of proposed moves for

the algorithm. The asymptotically optimal acceptance rate is 0.234 under

quite general conditions.

Supposedly ~50% is optimal for problems with 1 or 2 parameters (Section 4.2 of Gregory 2005). But if you have only 1 or 2 parameters, I don't know why you'd want to do MCMC, unless there is a nonanalytic likelihood to evaluate through a simulation or something. Gregory cites Roberts et al. above, but I didn't see anything about 50% for simpler problems. If you decide to read Roberts et al., let me know if I missed the note about 50%.

That said, 30-40% has treated me well over the years. Let's say you should aim for 25-30% acceptance of your jump proposals for the full line profile fit (Problem 3 of CA6).

## Tuesday, November 2, 2010

### CA6: MCMC results for linear response function

Here are the marginalized posterior pdfs for the linear coefficients describing response function in linear.txt:

It took me about a 1.5 hours to write the code to first do a grid search (red line, peak set to unity), and then write the MCMC algorithm (black histogram). I imagine the Gaussian line profile fit due Monday should take you about 3 hours, most of it debugging. Such is research.

Some tips for writing MCMC code:

It took me about a 1.5 hours to write the code to first do a grid search (red line, peak set to unity), and then write the MCMC algorithm (black histogram). I imagine the Gaussian line profile fit due Monday should take you about 3 hours, most of it debugging. Such is research.

Some tips for writing MCMC code:

- For the linear fit, do a grid analysis first. This will help you get a sense for the probability space. If you have a handy amoeba or downhill simplex code, you can do a maximum likelihood search for the best-fitting line profile parameters for problem 3, and then use those pars as the initial guesses for your MCMC code.
- Work with logarithmic probabilities!
- Make sure each step of your code makes logical sense. Did log(L) increase when you moved closer to the best values?
- You will start each chain link from some set of parameters with some starting value of log(L). If you do not accept the step, make sure you go back to the starting values of the parameters and log(L). Otherwise you random-walk yourself into oblivion (this is what had me stuck for 30 minutes this morning).
- 10^7 chain links required only 2 minutes on a 2.9 GHz Intel CPU for the linear problem. Vectorize your code!
- In IDL, use a WHILE or REPEAT-UNTIL loop. FOR loop control variables can't go high enough for most MCMC applications. UPDATE: Prof. Fitzgerald noted that you can use Long variable types as loop control variables: FOR i = 0L, long(N-1) DO ...

## Monday, November 1, 2010

### Good MCMC reference

http://web.mit.edu/~wingated/www/introductions/mcmc-gibbs-intro.pdf

This is just for your future reference. This is a handy guide to Monte Carlo integration (useful for the next time you have to evaluate a heinous multidimensional integral), with a second half on Markov Chains and Gibbs sampling.

This is just for your future reference. This is a handy guide to Monte Carlo integration (useful for the next time you have to evaluate a heinous multidimensional integral), with a second half on Markov Chains and Gibbs sampling.

### Reading Assignment: Jeffreys' Prior

This week's reading assignment is to find a resource that helps you understand the concept of a Jeffreys' prior. Due Wednesday is a short write-up about this topic at whatever level and length will help you understand the concept in the future, in the same spirit as the first HW assignment on probability distributions.

One good resource is Chapter 5.1 of Sivia (5.1.2 specifically).

We'll continue working on Class Activity 6 in class Wednesday. The figures and captions will be due next Monday.

One good resource is Chapter 5.1 of Sivia (5.1.2 specifically).

We'll continue working on Class Activity 6 in class Wednesday. The figures and captions will be due next Monday.

## Thursday, October 28, 2010

### HW 5 Clarifications: What does the evidence ratio mean? (and revisions to the assignment)

Questions 4 and 5 on the HW are a bit confusing. Here's how to think of it. [Skip to the bottom if you're confused by the following ramblings about the role of Monte Carlo simulations in this kind of analysis.]

An evidence ratio (or "Bayes factor") has a specific meaning: that is, it is the odds ratio in favor of one model over another. If Ev1/Ev2 = 9, then your data give 9:1 odds that Model 1 is right; in other words, you have 90% confidence (9/(9+1)) in Model 1.

Why then, you might (rightly) ask, are we asking you do to Monte Carlo simulations? And what then is the point, you might (astutely) inquire, of Section 3.2 of our paper?

Well, to be honest, the more I think about it the more unnecessary I believe Section 3.2 is. Except for two important purposes: a sanity check and a convincing demonstration. Monte Carlo simulations allow you test the frequentist performance of your Bayesian statistic; this is what we do in Section 3.2, and what we are asking you to do in question 4. In Section 3.2 it was relatively straightforward to do Monte Carlo simulations to do this, because the two models I was comparing have no free parameters. And we find that the simulation-based confidence of 94% is (almost) exactly what you get by just calculating the confidence directly from the R value. HOWEVER, it is no longer clear to me how to do these simulations, or exactly what they mean, in cases where you have free parameters. What I did in this case (Section 4.2/4.3) was to simulate data with various values of the parameters and weight the resulting R distribution by the values of the parameter that the data favored (i.e. marginalized over the parameter). This ends up with a result that does in fact confirm the confidence that the Bayes factor suggests (R = 1.16 gives 94% confidence analytically), but this is a different procedure from what we have asked you to do in Question 4, where we tell you to simulate data just using the best fit parameters.

In addition to this, Question 5 is somewhat unclear.

John's out of town right now, but in my role as temporary deputy of this class, I ask you to combine the tasks that the current Questions 4 and 5 are asking you to do by answering the following two-part question:

**How likely are you to be able to confidently (>95%) detect the presence of two lighthouses, as a function of their separation? And how likely are you to confidently select a one-LH model over a two-LH model, if the data really do come from just one LH?**

This will still require you to do MC simulations, but this time not to try to confirm the frequentist performance of your statistic, but as a way to help you understand the limits and capabilities of your experiment.

**To turn in Monday:**

**Example 1-d and 2-d posterior plots for each data set (i.e. the one generated with 1 LH and the one generated w/ 2 LHs), with the true values of alpha marked. I would like the 2-d posteriors to have 1-, 2-, and 3-sigma contours, though if you don't get around to this yet, not a huge deal.****A plot of "probability of confidently detecting 2 LHs" vs. LH separation (with a caption explaining what the plot means that someone not taking this class could understand)****A histogram of R values resulting from a large number of data sets simulated using one LH, with a caption giving your interpretation of what this means.**

## Wednesday, October 20, 2010

### HW4: My Solutions

Hey class- Here are the results I get from the two exercises. My prior confidence in myself doing things like this correctly on the first try is generally something around 93%. So if you get something different, check yourself first, but if your result is persistently different from mine, let me know!

Question 1: What kind of coin?

Question 2: Fair flipper or cheater?

Remember to write captions (short but to the point) for each figure: be very clear in your language about what you're calculating! This is particularly important because most people think Bayesian methods are scary and magical; your job is to show that what you do is clear and logical!

### HW4, Question 2 clarification

Hey folks- I apologize for the incomplete and inexact wording of question two on the latest worksheet.

Here is the story. A Type I flipper always tells the truth. A Type II flipper, with some unknown probability

Here is the story. A Type I flipper always tells the truth. A Type II flipper, with some unknown probability

*f*(*f*less than 0.5), will decide before a flip that, no matter what you guess, he will make you wrong. That is, no matter what the coin actually is, if you guess 'heads' he says 'tails,' and vice versa.### Model Selection in the Literature

## Monday, October 18, 2010

### Homework 3: What is due Wednesday

For Wednesday, you can skip Problem 4. I think it is too time-intensive for the lesson I wanted you to learn. The punchline is shown in the figure below, which was generated by Jon Swift (of Swift & Beaumont 2009).

The point is that the least-squares approach underestimates the power-law index for all values of N. So the next time you see a power-law fit in the literature, be suspicious!

For Wednesday, do Problem 3. In addition to what is written, estimate the power-law index for the exoplanet masses using MLE, and compare your answer to the value you find from a least-squares fit as discussed in Problem 4. For the least-squares fit, you can ignore mass bins containing zero planets.

Turn in a figure showing the pdf for alpha, and a figure illustrating the best-fitting powerlaws from both methods compared to a histogram of the planet masses. For the two figures, write captions describing what is plotted.

ETC: less than 2 hours

Reading assignment due Wed is still Sivia Chapter 4.1

The point is that the least-squares approach underestimates the power-law index for all values of N. So the next time you see a power-law fit in the literature, be suspicious!

For Wednesday, do Problem 3. In addition to what is written, estimate the power-law index for the exoplanet masses using MLE, and compare your answer to the value you find from a least-squares fit as discussed in Problem 4. For the least-squares fit, you can ignore mass bins containing zero planets.

Turn in a figure showing the pdf for alpha, and a figure illustrating the best-fitting powerlaws from both methods compared to a histogram of the planet masses. For the two figures, write captions describing what is plotted.

ETC: less than 2 hours

Reading assignment due Wed is still Sivia Chapter 4.1

### Are your likelihoods slowing you down?

Sorry about the formatting here; looks like the Google docs embed thing isn't quite compatible with the blog formatting....Hope you can still read what you need here.

## Friday, October 15, 2010

### Programming Investment: Making Figures

If you continue on in astronomy, you'll find (if you haven't already) that most of the actual time you spend doing research is, in fact, programming. Not just programming to actually do whatever analysis you're doing, but often a comparable amount of time putting together nifty professional-looking, presentable figures to communicate your results.

With this in mind, you should be thinking about building an accessible toolkit of useful routines that perform tasks that you expect to repeat often. A first example of something like this would be the function we had you write a few weeks ago that calculates a 1-D confidence interval, given vectors of

*x*and*p(x)*. You should consider writing similar routines for common plotting tasks.For example, I've written a function called "plot_posterior" that takes as input

*x*and*p(x)*(or*L(x)*--not necessarily normalized), just like your confidence interval function, and makes a pretty plot, as follows:>>> x = arange(-2,6,0.001) #make x vector

>>> px = normal(x,2,1) #evaluate N(2,1) on x

>>> plot_posterior(x,px,name='x',labelpos=(0.65,0.8)) #make the plot

(Note the default thicker-than-normal axes and lines, which is the quickest way to make your plots look more presentable and should become a habit.)

The function also has other optional parameters, such as plotting different confidence intervals, marking the median value instead of the mode as the representative value, using a symmetric interval instead of the shortest interval, turn off shading, turn off labeling, change the number of decimal places to which the result and errors are quoted, etc. This way, whenever I want to make a nice plot of a posterior distribution with regions marked and labeled and all, I don't have to start from scratch.

I've also written a similar function ('plot_posterior2d') to summarize the results of a two-dimensional parameter estimation, as you are currently working on with the Fischer & Valenti data. This produces the following, which is a nice concise way to illustrate the results:

>>> plot_posterior2d(a,b,L,confs=[0.68,0.95,0.99],'a','b',decplaces1=3)

As you can see, this function uses the 1D version of 'plot_posterior' to make the marginalized pdfs (once called with the 'horizontal' option). (It also draws only selected confidence contours on the contour plot, which is something we have not yet asked you to do, but you should be thinking about.) And now, with a modest front-end time investment, I can instantly make a snappy publication-quality plot of any two-dimensional likelihood surface I wish, potentially saving lots of time down the road.

NOTE: I'm doing all my work here using python, which I heartily recommend to all of you to experiment with at some point, if solely because of this amazing collection of example plots that I visit anytime I ever want to make any kind of fancy-looking plot. The language is quite similar in many ways to both Matlab and IDL, and has the benefits that it's both much more portable and much more free than either. I've just picked it up in the last year or so, and I'd be happy to help guide anyone who's interested.

## Thursday, October 14, 2010

### How to present a result?

When using Bayesian inference for parameter estimation, one has several choices as to how to present your results. In question #4 on HW2, you are asked to calculate the posterior pdf for

*f*, the binary fraction of O stars, given 3 companion detections in 21 systems. If you were presenting this result in a paper, how would you characterize it? What is*the answer*?Well, in Bayesian parameter estimation,

*the answer*is the entire posterior pdf:But of course, you need to quote a result somewhere in the text, and perhaps in a table, in order that someone else can read your paper and quickly take your result and use it in their work. So what do you do? Do you quote the median with a symmetric 68% confidence interval?

This is the easiest one to calculate, but with an asymmetric pdf like this it doesn't actually look that great, in my opinion. So then do you quote the most likely value, with a different confidence interval? (I believe this one is the "shortest confidence interval.")

Maybe looks a bit nicer, but really, who's to judge? Anyway, the point is that exactly the same data and analysis gives slightly different "results," depending on what you choose to quote.

This is funny, because by summarizing "results" like this you're actually encouraging others to play the "assume a normal distribution with a particular sigma" game. And I fully admit to having done this myself: taking quoted results that I

*know*don't come from a normal posterior distribution and approximating them as normal in my analysis. What can I say, sometimes we're just lazy...who wants to track down a whole bunch of exact posterior distributions from a dozen different authors? But at least this way I can comfort myself by knowing that while I'm taking shortcuts, I*know*that I'm taking them....## Wednesday, October 13, 2010

### Homework 2: Example TeX file

\begin{document}

\title{Your Title}

\author{Joan T. Studint\altaffilmark{1}}

\email{studint@caltech.edu}

\altaffiltext{1}{Department of Astrophysics, California Institute of Technology, MC 249-17, Pasadena, CA 91125}

\begin{abstract}

Background and motivation. Methodology. Results. Conclusion.

\end{abstract}

\keywords{methods: statistical}

\section{Data}

Where did you get the data? Cite your sources. How were the data measured?

\section{Analysis}

Describe your analysis. Include equations

\section{Results}

What did you find? Show figures. How did it compare to previous studies?

\end{document}

\title{Your Title}

\author{Joan T. Studint\altaffilmark{1}}

\email{studint@caltech.edu}

\altaffiltext{1}{Department of Astrophysics, California Institute of Technology, MC 249-17, Pasadena, CA 91125}

\begin{abstract}

Background and motivation. Methodology. Results. Conclusion.

\end{abstract}

\keywords{methods: statistical}

\section{Data}

Where did you get the data? Cite your sources. How were the data measured?

\section{Analysis}

Describe your analysis. Include equations

\section{Results}

What did you find? Show figures. How did it compare to previous studies?

\end{document}

### Correction to HW 2 data set

There was an error in the cross-listed data set. The error has been fixed and checked against exoplanets.org.

The updated data set can be found here:

http://www.astro.caltech.edu/~johnjohn/astrostats/data/fv05_vf05_data.txt

The updated data set can be found here:

http://www.astro.caltech.edu/~johnjohn/astrostats/data/fv05_vf05_data.txt

### Correction to HW 2

Problem 9 refers to "HW3." This is a mistake. It is referring to the analysis in problem 8.

### Office Hours 8-10pm, Thursdays

We will be holding weekly "homework lab"-type office hours from 8pm to 10pm on Thursdays, in the classroom (Cahill 219). Please come by if you have questions or if you're stuck with anything on the assignments; either Professor Johnson or Tim (or both) will be there to help.

### Data for Class Activity 2

The cross-listed data can be found here:

http://www.astro.caltech.edu/~johnjohn/astrostats/data/fv05_vf05_data.txt

http://www.astro.caltech.edu/~johnjohn/astrostats/data/fv05_vf05_data.txt

## Tuesday, October 12, 2010

### Class Activity 2.1: Time per problem

General comment on CA2:

1) Problems 9 and 10 (the writeup) will be due Monday. For problem 9, compare the marginalized pdfs of alpha and beta for the case with and without consideration of sigma_[Fe/H]. When you get here Wednesday or thereafter, do not do the integrals analytically! Do your integration numerically.

2) The primary goals of this assignment are

3) Time spent debugging should decline monotonically throughout the semester. If you get discouraged after spending an hour chasing down a MATLAB or LaTeX error, take heart in that this is one fewer bug to chase down later as you pursue your research.

4) Ask for help, especially if you exceed any specific ETC below. Talk in person and via email to your classmates, Tim, or myself.

Comments and ETCs for homework problems:

Several students have requested an estimate for the amount of time one should spend on each problem. For Wednesday, problems 1-7 are "due," which means that you should be prepared to work on problems 8-10 in class with your partner. It looked like most teams finished through 4 in class, so here are my estimated times to completion (ETC) and notes for problems 5-7:

CA5) You may ignore my specific suggestion of assigning 60% completeness to half of the stars. The key to this problem is to investigate how detection sensitivity (completeness) factors into your estimate of f. You may assign Si to the stars in your sample as you see fit. Query your intuition beforehand; how do you expect S to affect your estimate of f? Then do the test.

This should take no longer than it took you to complete problem 4: ETC 20 minutes

CA6) I will send out the cross-listed data set thanks to Mike B. and Matt.

You should still look up the FV05 paper on NASA ADS, and read the sections that describe their data and analysis so you have the necessary background for this problem (the intro would also be useful for context). ETC 30-40 minutes for reading the relevant sections

CA7) ETC 10 minutes of thought after examining Fig 5 in FV05.

Total ETC <= 1.2 hours

1) Problems 9 and 10 (the writeup) will be due Monday. For problem 9, compare the marginalized pdfs of alpha and beta for the case with and without consideration of sigma_[Fe/H]. When you get here Wednesday or thereafter, do not do the integrals analytically! Do your integration numerically.

2) The primary goals of this assignment are

- to understand how to calculate the occurrence rate as a function of object properties (planet occurrence as a function of stellar metallicity).
- present your results clearly and professionally
- gain science writing experience

3) Time spent debugging should decline monotonically throughout the semester. If you get discouraged after spending an hour chasing down a MATLAB or LaTeX error, take heart in that this is one fewer bug to chase down later as you pursue your research.

4) Ask for help, especially if you exceed any specific ETC below. Talk in person and via email to your classmates, Tim, or myself.

Comments and ETCs for homework problems:

Several students have requested an estimate for the amount of time one should spend on each problem. For Wednesday, problems 1-7 are "due," which means that you should be prepared to work on problems 8-10 in class with your partner. It looked like most teams finished through 4 in class, so here are my estimated times to completion (ETC) and notes for problems 5-7:

CA5) You may ignore my specific suggestion of assigning 60% completeness to half of the stars. The key to this problem is to investigate how detection sensitivity (completeness) factors into your estimate of f. You may assign Si to the stars in your sample as you see fit. Query your intuition beforehand; how do you expect S to affect your estimate of f? Then do the test.

This should take no longer than it took you to complete problem 4: ETC 20 minutes

CA6) I will send out the cross-listed data set thanks to Mike B. and Matt.

You should still look up the FV05 paper on NASA ADS, and read the sections that describe their data and analysis so you have the necessary background for this problem (the intro would also be useful for context). ETC 30-40 minutes for reading the relevant sections

CA7) ETC 10 minutes of thought after examining Fig 5 in FV05.

Total ETC <= 1.2 hours

Subscribe to:
Posts (Atom)