These RV measurements are in the same format as last week's HD102964 data set (t, v, sigma_v).

10365.979236 -5.4720 0.9872

10715.080012 -7.4086 1.0981

10786.759306 0.4987 1.2969

10806.886343 -5.0791 0.9564

11010.115556 3.5447 1.0068

11012.085440 4.8886 0.9584

11013.103947 3.1676 1.1267

11014.105914 4.3955 1.0783

11070.992361 1.8550 1.3133

11170.793657 4.4971 1.1046

11374.123206 0.0000 1.0404

11412.084051 4.0750 1.2597

11585.736539 16.8490 1.0630

11757.136620 -4.7007 1.0475

11883.801736 2.1463 1.3573

12098.097419 -10.4014 1.1261

12134.022928 -15.5328 1.2441

12162.926852 -5.7099 1.2710

12188.010926 -5.7224 1.1966

12218.947315 -12.4545 1.2057

12235.761655 -5.3930 1.2138

12489.047558 -12.2946 1.0018

12516.010463 -15.1926 1.2005

12535.910926 -2.3131 1.3004

12572.946597 -5.6120 1.3796

12600.927569 -8.5339 1.3608

12651.762963 -6.4725 1.0555

12833.093715 -0.4214 1.1765

12834.127720 5.3296 1.2034

12835.121528 0.2185 1.3992

12836.075405 0.9086 1.1630

12850.087384 -2.7960 1.0992

12851.131667 -1.2862 1.1301

12854.031620 0.5737 1.2861

12855.114792 -0.6867 1.3715

12856.061181 -1.0921 1.3094

12897.920671 0.0466 1.2069

12899.035324 -5.1169 1.2442

12924.926863 5.1777 1.2645

12926.058322 1.1283 1.2763

12987.739282 12.2442 1.3284

12988.761852 7.2284 1.2760

12989.781250 5.5562 1.3132

13014.741790 0.9943 0.8870

13015.745883 2.7821 0.7728

13016.749759 6.5555 0.7824

13017.765800 0.5844 0.7813

13072.716047 9.2352 1.0131

13180.129491 8.6986 1.1281

13181.124873 13.2785 1.3097

13182.120243 7.6246 1.1346

13190.126876 9.3208 0.6338

13191.126546 9.5378 0.7480

13196.114144 -2.8805 1.2701

13197.125305 -5.7718 0.8569

13207.117720 4.3347 1.3446

13208.123322 8.6826 1.1997

13238.075000 0.4488 1.1177

13239.137743 4.7305 1.0623

13239.983356 3.7248 1.0509

13241.043070 6.8523 0.7034

13301.800231 5.8442 1.1004

13302.870058 11.5434 1.0594

13303.894282 1.6920 0.8718

13338.766794 10.3218 0.9728

13339.911921 5.3337 1.0051

13368.839549 1.9201 0.9236

13369.760250 2.9115 0.6046

13397.727720 -0.5044 0.9282

13398.748762 7.9218 0.8294

13399.719965 7.9981 0.8511

13400.722338 4.0888 0.9303

13425.749826 1.5571 0.9956

13426.711343 1.1308 0.9198

13427.763414 1.7341 0.9986

13551.131655 0.1543 0.9205

13552.133299 -1.7902 0.9850

13696.900440 -5.7341 1.0210

13723.843958 -4.9887 1.0141

13746.743835 1.4755 0.5480

13747.803715 0.9159 0.9375

13748.728171 -2.2966 0.9150

13749.734456 3.1935 0.8821

13750.716424 1.9810 0.9588

13751.696863 2.5223 0.9091

13752.695995 2.8298 1.0102

13753.694321 2.7341 0.6803

13775.730961 -8.2307 1.0315

13776.722211 -3.9923 0.9291

13926.130456 -9.5344 0.5636

13927.109433 -11.1021 1.0593

13928.116435 -8.9531 0.9818

13933.092130 -5.0851 0.9839

13962.120150 -12.5511 0.9665

13981.981065 -8.1267 1.0270

13983.031655 -4.7627 1.1662

13984.031319 -13.4317 1.0740

13984.981215 -7.9433 0.9975

14023.985822 -8.4543 1.0863

14083.846400 -8.5211 1.1254

14084.801794 -8.2853 1.1667

14085.926840 -10.7708 1.1112

14129.728727 -9.8557 1.1174

14130.808565 -6.8924 1.0496

14131.735139 -9.5523 1.1019

14138.726123 -19.2455 1.0797

14307.135394 -19.3035 1.1076

14314.106944 -10.6002 1.1014

14337.109931 -18.1683 1.1961

14343.989815 -14.3236 1.1816

14396.843993 -8.1080 1.1140

14428.938345 -9.9274 0.6591

14639.129965 -10.0765 1.1747

14675.048241 -16.5316 1.1695

14689.013530 -9.5340 1.2468

14717.956979 -18.2615 1.3374

14725.896910 -6.4570 1.3795

14727.012153 -11.5006 1.2117

14727.955961 -2.4502 1.1250

14838.800382 -0.6436 0.9883

15015.121124 -3.7773 1.0546

15049.114495 -2.2694 1.2116

15133.998578 -7.5107 1.1313

15171.896575 -0.4995 1.2235

15189.809291 -3.7323 1.2005

15196.784565 -1.1088 1.0145

15231.702634 1.9133 1.0408

15255.709429 -0.0267 1.1699

15260.711648 10.2724 1.2097

15406.115969 -2.8208 1.0969

15407.102724 -1.6299 1.0710

15414.080465 -0.5420 1.1696

15426.085724 6.4654 1.0386

15435.115410 3.9615 1.1658

15439.122233 -3.5856 1.1219

15465.062030 6.6551 1.2756

15487.037152 0.0895 1.1080

## Monday, November 22, 2010

## Wednesday, November 17, 2010

## Monday, November 15, 2010

### AMOEBA implementation

Here's my code. IDL users, feel free to use this as-is. MATLAB users, feel free to translate it. The point of this assignment is to get the Bayesian periodogram working, not to fiddle with AMOEBA. However, it's a handy tool to have in your belt, so I encourage you to make sure you understand this example of its use.

;===================SINEFIT.PRO======================

function sineorbit, x, per, par

return, par[0] * sin(2*!pi/per * x + par[1]) + par[2]

end

function sinefunc, par

common sinestruct, struct

ymod = sineorbit(struct.x, struct.per, par)

;;; MaxL is just -chi^2 / 2 without the constant. Just use chi as

;;; statistic to be minimized

chi = total((struct.y-ymod)^2/(2.*struct.err)^2)

return, chi

end

function sinefit, x, y, err, period

common sinestruct, struct

struct = {x:x, y:y, err:err, per:period} ;;; create "control" structure

p0 = [(max(y)-min(y))/2., 0., median(y)] ;;; Initital guesses based on data

;;; The scale is the fractional moves in each parameter that AMOEBA will

;;; make while oozing around in maxL (chi^2) space.

scale = [p0[0]*0.05, !pi*0.1, p0[0]*0.05]

;;; Call amoeba and stop after 5% = 0.05 decrease in chi^2 is reached

pbest = amoeba(0.05, function_name='sinefunc', p0=p0, scale=scale)

;;; Calculate best-fitting orbit

ymod = sineorbit(x, period, pbest)

chi = total((struct.y-ymod)^2/(2.*struct.err)^2)

maxl = - total(0.5*alog(2.*!dpi*err^2)) - chi/2.

return, maxl

end

;===================SINEFIT.PRO======================

function sineorbit, x, per, par

return, par[0] * sin(2*!pi/per * x + par[1]) + par[2]

end

function sinefunc, par

common sinestruct, struct

ymod = sineorbit(struct.x, struct.per, par)

;;; MaxL is just -chi^2 / 2 without the constant. Just use chi as

;;; statistic to be minimized

chi = total((struct.y-ymod)^2/(2.*struct.err)^2)

return, chi

end

function sinefit, x, y, err, period

common sinestruct, struct

struct = {x:x, y:y, err:err, per:period} ;;; create "control" structure

p0 = [(max(y)-min(y))/2., 0., median(y)] ;;; Initital guesses based on data

;;; The scale is the fractional moves in each parameter that AMOEBA will

;;; make while oozing around in maxL (chi^2) space.

scale = [p0[0]*0.05, !pi*0.1, p0[0]*0.05]

;;; Call amoeba and stop after 5% = 0.05 decrease in chi^2 is reached

pbest = amoeba(0.05, function_name='sinefunc', p0=p0, scale=scale)

;;; Calculate best-fitting orbit

ymod = sineorbit(x, period, pbest)

chi = total((struct.y-ymod)^2/(2.*struct.err)^2)

maxl = - total(0.5*alog(2.*!dpi*err^2)) - chi/2.

return, maxl

end

### We still have class!

We still have 3 good weeks of class, so let's keep learning even while you work on your end-of-term projects. We'll have in-class activities, but optional homework (only do as much as you need to learn).

## Sunday, November 14, 2010

### Class Activity 8: HD102956 data set

This week we will examine the radial velocity time series of the 1.8 solar-mass star HD102956. The measurements were made using the HIgh-Resolution Echelle Spectrometer, which is mounted on the Keck 1 telescope. The first column is the Julian Day number minus 2.44e6. The second column is the relative velocity of the star in meters per second. The third column is the measurement uncertainty, which does not account for errors in the velocity related to the star moving around (velocity "jitter"). The class activity worksheet (in class Monday) will describe how to analyze these data.

14216.805150 21.5134 1.1715

14429.150243 15.7103 1.4097

14866.114214 47.6038 1.2837

14986.851118 -48.4759 1.2909

15014.771611 69.0507 1.0646

15016.870291 -38.8699 1.1213

15284.916557 -69.2668 1.3374

15285.992750 -4.1502 1.2532

15313.914745 53.7661 1.2567

15314.828012 0.0000 1.1595

15343.795192 -56.4066 1.1648

15344.885379 14.0928 1.3477

15350.785608 -30.2992 1.2856

15351.880460 44.4481 1.2105

15372.803796 41.1830 1.1439

15373.796131 -32.2805 1.2237

15374.752750 -88.3723 1.0914

15376.783324 -18.5026 1.1120

15377.812002 41.1884 1.1311

15378.748741 60.3889 1.1266

15379.786957 10.0510 1.1149

15380.804516 -60.2914 1.1533

15400.780895 -67.3449 1.3536

14216.805150 21.5134 1.1715

14429.150243 15.7103 1.4097

14866.114214 47.6038 1.2837

14986.851118 -48.4759 1.2909

15014.771611 69.0507 1.0646

15016.870291 -38.8699 1.1213

15284.916557 -69.2668 1.3374

15285.992750 -4.1502 1.2532

15313.914745 53.7661 1.2567

15314.828012 0.0000 1.1595

15343.795192 -56.4066 1.1648

15344.885379 14.0928 1.3477

15350.785608 -30.2992 1.2856

15351.880460 44.4481 1.2105

15372.803796 41.1830 1.1439

15373.796131 -32.2805 1.2237

15374.752750 -88.3723 1.0914

15376.783324 -18.5026 1.1120

15377.812002 41.1884 1.1311

15378.748741 60.3889 1.1266

15379.786957 10.0510 1.1149

15380.804516 -60.2914 1.1533

15400.780895 -67.3449 1.3536

## Thursday, November 11, 2010

### No Final Exam, but...

Clear your calendars for Wednesday, December 8 for the annual Ay117 Statistics in Astronomy Conference. This year the conference is in sunny Pasadena, CA, at the world-famous Cahill Center for Astronomy & Astrophysics.

Conference registration is free, yet mandatory for Ay117 students and TAs.

We will have seven 10-minute oral presentations (+5 minutes for Q&A), and a poster session. Sign up now for a poster or oral presentation slot (posters are graded much more harshly than oral presentations). Abstract submission open through November 17.

The session chair is Professor John Johnson who will give a talk entitled "Bayesian Statistics: There's no magic." Contact him via email for abstract submission (plain text only).

Conference registration is free, yet mandatory for Ay117 students and TAs.

We will have seven 10-minute oral presentations (+5 minutes for Q&A), and a poster session. Sign up now for a poster or oral presentation slot (posters are graded much more harshly than oral presentations). Abstract submission open through November 17.

The session chair is Professor John Johnson who will give a talk entitled "Bayesian Statistics: There's no magic." Contact him via email for abstract submission (plain text only).

### Class Activity 7 finally posted

Nothing has changed since it was assigned in class. Here's the official class activity worksheet.

http://www.astro.caltech.edu/~johnjohn/astrostats/homework/CA7.pdf

http://www.astro.caltech.edu/~johnjohn/astrostats/homework/CA7.pdf

### More reading materials on nested sampling

http://www.inference.phy.cam.ac.uk/bayesys/box/nested.pdf'

Also, check out David MacKay's website:

http://www.inference.phy.cam.ac.uk/bayesys/

Also, check out David MacKay's website:

http://www.inference.phy.cam.ac.uk/bayesys/

## Sunday, November 7, 2010

### Good, but not required, reading

Andrew Gelman's discussion of Harold Jeffreys' Theory of Probability:

http://www.stat.columbia.edu/~gelman/research/published/jeffreys.pdf

http://www.stat.columbia.edu/~gelman/research/published/jeffreys.pdf

## Friday, November 5, 2010

### On Jeffreys priors: scale vs. location

I have to admit that even after reviewing many of your write-ups, I remained a bit uncertain about when I should and should not use a Jeffreys prior for a parameter (i.e. p(x) ~ 1/x). It is clear to me that the determination of the parameters describing e.g. a normal distribution involves a scale parameter sigma and a location parameter mu, and the uninformed prior for the scale parameter is p(sigma) ~ 1/sigma, and for the location parameter it's a uniform prior p(mu) ~ 1.

But what happens when mu itself is parametrized, such as mu(x; A,B) = Ax + B? Is A a scale parameter while B is a location parameter? Before class Wednesday, I would have answered yes. After class, I was seriously doubting myself.

Unfortunately, after several hours of paging through books and searching the web I couldn't find a definitive answer to this question. The difficulty is that all of the examples of Jeffreys priors I found use a normal distribution as an example, or some other simple distribution (Poisson or binomial). But I could find precisely nothing about the similarly simple and far more ubiquitous problem of an uninformative prior on the slope of a linear fit! What's up with that? So I decided to do what I should have done at first: query my intuition rather than statistics papers.

What is it that we're trying to do with a Jeffreys prior? We are simply trying to properly encapsulate the notion that we have no prior knowledge of a parameter. Let's work with the example of a time series of flux values with normal errors, for which we suspect the flux level is a linear function of time:

F(t; A, B) = A + Bt.

Is a uniform prior appropriate for the slope B? A uniform prior would say that the slope is equally likely to lie between the interval 1 and 2 as it would between 1,000,000 and 1,000,001. Hmmm, this doesn't feel right. The former increase in the slope is a 100% change, while the latter is an increase of only 1 part in a million. By making the statement p(B) ~ const, we'd be saying that a slope between 1,000,001 and 1,000,003 is twice as likely as a slope between 1 and 2! If we were to make this statement, then we'd clearly have some pretty strong prior information. We'd be far from uninformed.

So in the case of a slope, I'd argue that we have a clear-cut reason to rely on a Jeffreys prior of the form p(B) ~ 1/B, which is the same as saying p(lnB) ~ const., or every log interval is equally likely. This feels much better.

Keep in mind that when we talk about prior information, we are talking about knowledge we had about the problem at hand before we were handed the data. For the case of a flux level that is linear in time, what do we know about the offset A? Well, we certainly do not know its numerical value without peeking at the data. But we do know that the shape of the function A + Bt is going to look the same for any choice of the offset, A. The offset can be 1 or 1000 and it won't change the fact that we'll end end up with B more flux in each unit time interval.

Thus, a uniform prior is appropriate for the offset parameter A: the offset interval 1 to 2 is just as likely as the interval 1,000,000 to 1,000,001. The offset is a location parameter, and location parameters call for a uniform prior.

Another way to think about all this is to consider two choices of units. If I change the y-axis from ergs/s/cm^2 to horsepower per acre, the offset just translates from one numerical value to another. However, for a fixed offset A, the units are crucial for a slope value B. If I tell you that the slope is B=1, it matters a lot whether we are talking about deaths per year or deaths per second! As a result, before we have the data sitting on our hard drive we certainly don't want to pick a prior for the slope (rate) that expresses a preference in one choice of units over another.

Using this reasoning p(A) ~ const. and p(B) ~ 1/B feel about right to me for the simple case of a linear fit with normally-distributed errors, or for any other model parametrization for that matter (see Gregory 2005 for the case of a Keplerian orbit, in which the Doppler amplitude K is given a Jeffreys' prior while the phase information is uniform). I'm not much interested in the square root of the determinant of the Fischer information. My intuition tells me that location parameters get uniform priors, and scale parameters (in which the units on the x or y axis matter) get p(B) ~ 1/B priors.

But don't take my word for it, query your own intuition.

But what happens when mu itself is parametrized, such as mu(x; A,B) = Ax + B? Is A a scale parameter while B is a location parameter? Before class Wednesday, I would have answered yes. After class, I was seriously doubting myself.

Unfortunately, after several hours of paging through books and searching the web I couldn't find a definitive answer to this question. The difficulty is that all of the examples of Jeffreys priors I found use a normal distribution as an example, or some other simple distribution (Poisson or binomial). But I could find precisely nothing about the similarly simple and far more ubiquitous problem of an uninformative prior on the slope of a linear fit! What's up with that? So I decided to do what I should have done at first: query my intuition rather than statistics papers.

What is it that we're trying to do with a Jeffreys prior? We are simply trying to properly encapsulate the notion that we have no prior knowledge of a parameter. Let's work with the example of a time series of flux values with normal errors, for which we suspect the flux level is a linear function of time:

F(t; A, B) = A + Bt.

Is a uniform prior appropriate for the slope B? A uniform prior would say that the slope is equally likely to lie between the interval 1 and 2 as it would between 1,000,000 and 1,000,001. Hmmm, this doesn't feel right. The former increase in the slope is a 100% change, while the latter is an increase of only 1 part in a million. By making the statement p(B) ~ const, we'd be saying that a slope between 1,000,001 and 1,000,003 is twice as likely as a slope between 1 and 2! If we were to make this statement, then we'd clearly have some pretty strong prior information. We'd be far from uninformed.

So in the case of a slope, I'd argue that we have a clear-cut reason to rely on a Jeffreys prior of the form p(B) ~ 1/B, which is the same as saying p(lnB) ~ const., or every log interval is equally likely. This feels much better.

Keep in mind that when we talk about prior information, we are talking about knowledge we had about the problem at hand before we were handed the data. For the case of a flux level that is linear in time, what do we know about the offset A? Well, we certainly do not know its numerical value without peeking at the data. But we do know that the shape of the function A + Bt is going to look the same for any choice of the offset, A. The offset can be 1 or 1000 and it won't change the fact that we'll end end up with B more flux in each unit time interval.

Thus, a uniform prior is appropriate for the offset parameter A: the offset interval 1 to 2 is just as likely as the interval 1,000,000 to 1,000,001. The offset is a location parameter, and location parameters call for a uniform prior.

Another way to think about all this is to consider two choices of units. If I change the y-axis from ergs/s/cm^2 to horsepower per acre, the offset just translates from one numerical value to another. However, for a fixed offset A, the units are crucial for a slope value B. If I tell you that the slope is B=1, it matters a lot whether we are talking about deaths per year or deaths per second! As a result, before we have the data sitting on our hard drive we certainly don't want to pick a prior for the slope (rate) that expresses a preference in one choice of units over another.

Using this reasoning p(A) ~ const. and p(B) ~ 1/B feel about right to me for the simple case of a linear fit with normally-distributed errors, or for any other model parametrization for that matter (see Gregory 2005 for the case of a Keplerian orbit, in which the Doppler amplitude K is given a Jeffreys' prior while the phase information is uniform). I'm not much interested in the square root of the determinant of the Fischer information. My intuition tells me that location parameters get uniform priors, and scale parameters (in which the units on the x or y axis matter) get p(B) ~ 1/B priors.

But don't take my word for it, query your own intuition.

## Thursday, November 4, 2010

## Wednesday, November 3, 2010

### Tuning the jump distribution

I mentioned in class that the optimal acceptance rate for MCMC jump proposals is 30-40% based on what others had recommended to me when teaching me the Way of the Markov Chain. It turns out, however, that for higher-order problems the optimal rate is 0.234 (Corolary 1.2 of Roberts, Gelman and Gilks 1997). If you want to know exactly why this is so, you should read the full paper. It's full of lemma-ny goodness! According to the abstract:

Supposedly ~50% is optimal for problems with 1 or 2 parameters (Section 4.2 of Gregory 2005). But if you have only 1 or 2 parameters, I don't know why you'd want to do MCMC, unless there is a nonanalytic likelihood to evaluate through a simulation or something. Gregory cites Roberts et al. above, but I didn't see anything about 50% for simpler problems. If you decide to read Roberts et al., let me know if I missed the note about 50%.

That said, 30-40% has treated me well over the years. Let's say you should aim for 25-30% acceptance of your jump proposals for the full line profile fit (Problem 3 of CA6).

The limiting diffusion approximation admits a straightforward efﬁ-No one breaks it down for the layperson like a statistician!

ciency maximization problem, and the resulting asymptotically optimal

policy is related to the asymptotic acceptance rate of proposed moves for

the algorithm. The asymptotically optimal acceptance rate is 0.234 under

quite general conditions.

Supposedly ~50% is optimal for problems with 1 or 2 parameters (Section 4.2 of Gregory 2005). But if you have only 1 or 2 parameters, I don't know why you'd want to do MCMC, unless there is a nonanalytic likelihood to evaluate through a simulation or something. Gregory cites Roberts et al. above, but I didn't see anything about 50% for simpler problems. If you decide to read Roberts et al., let me know if I missed the note about 50%.

That said, 30-40% has treated me well over the years. Let's say you should aim for 25-30% acceptance of your jump proposals for the full line profile fit (Problem 3 of CA6).

## Tuesday, November 2, 2010

### CA6: MCMC results for linear response function

Here are the marginalized posterior pdfs for the linear coefficients describing response function in linear.txt:

It took me about a 1.5 hours to write the code to first do a grid search (red line, peak set to unity), and then write the MCMC algorithm (black histogram). I imagine the Gaussian line profile fit due Monday should take you about 3 hours, most of it debugging. Such is research.

Some tips for writing MCMC code:

It took me about a 1.5 hours to write the code to first do a grid search (red line, peak set to unity), and then write the MCMC algorithm (black histogram). I imagine the Gaussian line profile fit due Monday should take you about 3 hours, most of it debugging. Such is research.

Some tips for writing MCMC code:

- For the linear fit, do a grid analysis first. This will help you get a sense for the probability space. If you have a handy amoeba or downhill simplex code, you can do a maximum likelihood search for the best-fitting line profile parameters for problem 3, and then use those pars as the initial guesses for your MCMC code.
- Work with logarithmic probabilities!
- Make sure each step of your code makes logical sense. Did log(L) increase when you moved closer to the best values?
- You will start each chain link from some set of parameters with some starting value of log(L). If you do not accept the step, make sure you go back to the starting values of the parameters and log(L). Otherwise you random-walk yourself into oblivion (this is what had me stuck for 30 minutes this morning).
- 10^7 chain links required only 2 minutes on a 2.9 GHz Intel CPU for the linear problem. Vectorize your code!
- In IDL, use a WHILE or REPEAT-UNTIL loop. FOR loop control variables can't go high enough for most MCMC applications. UPDATE: Prof. Fitzgerald noted that you can use Long variable types as loop control variables: FOR i = 0L, long(N-1) DO ...

## Monday, November 1, 2010

### Good MCMC reference

http://web.mit.edu/~wingated/www/introductions/mcmc-gibbs-intro.pdf

This is just for your future reference. This is a handy guide to Monte Carlo integration (useful for the next time you have to evaluate a heinous multidimensional integral), with a second half on Markov Chains and Gibbs sampling.

This is just for your future reference. This is a handy guide to Monte Carlo integration (useful for the next time you have to evaluate a heinous multidimensional integral), with a second half on Markov Chains and Gibbs sampling.

### Reading Assignment: Jeffreys' Prior

This week's reading assignment is to find a resource that helps you understand the concept of a Jeffreys' prior. Due Wednesday is a short write-up about this topic at whatever level and length will help you understand the concept in the future, in the same spirit as the first HW assignment on probability distributions.

One good resource is Chapter 5.1 of Sivia (5.1.2 specifically).

We'll continue working on Class Activity 6 in class Wednesday. The figures and captions will be due next Monday.

One good resource is Chapter 5.1 of Sivia (5.1.2 specifically).

We'll continue working on Class Activity 6 in class Wednesday. The figures and captions will be due next Monday.

Subscribe to:
Posts (Atom)