These RV measurements are in the same format as last week's HD102964 data set (t, v, sigma_v).
10365.979236 -5.4720 0.9872
10715.080012 -7.4086 1.0981
10786.759306 0.4987 1.2969
10806.886343 -5.0791 0.9564
11010.115556 3.5447 1.0068
11012.085440 4.8886 0.9584
11013.103947 3.1676 1.1267
11014.105914 4.3955 1.0783
11070.992361 1.8550 1.3133
11170.793657 4.4971 1.1046
11374.123206 0.0000 1.0404
11412.084051 4.0750 1.2597
11585.736539 16.8490 1.0630
11757.136620 -4.7007 1.0475
11883.801736 2.1463 1.3573
12098.097419 -10.4014 1.1261
12134.022928 -15.5328 1.2441
12162.926852 -5.7099 1.2710
12188.010926 -5.7224 1.1966
12218.947315 -12.4545 1.2057
12235.761655 -5.3930 1.2138
12489.047558 -12.2946 1.0018
12516.010463 -15.1926 1.2005
12535.910926 -2.3131 1.3004
12572.946597 -5.6120 1.3796
12600.927569 -8.5339 1.3608
12651.762963 -6.4725 1.0555
12833.093715 -0.4214 1.1765
12834.127720 5.3296 1.2034
12835.121528 0.2185 1.3992
12836.075405 0.9086 1.1630
12850.087384 -2.7960 1.0992
12851.131667 -1.2862 1.1301
12854.031620 0.5737 1.2861
12855.114792 -0.6867 1.3715
12856.061181 -1.0921 1.3094
12897.920671 0.0466 1.2069
12899.035324 -5.1169 1.2442
12924.926863 5.1777 1.2645
12926.058322 1.1283 1.2763
12987.739282 12.2442 1.3284
12988.761852 7.2284 1.2760
12989.781250 5.5562 1.3132
13014.741790 0.9943 0.8870
13015.745883 2.7821 0.7728
13016.749759 6.5555 0.7824
13017.765800 0.5844 0.7813
13072.716047 9.2352 1.0131
13180.129491 8.6986 1.1281
13181.124873 13.2785 1.3097
13182.120243 7.6246 1.1346
13190.126876 9.3208 0.6338
13191.126546 9.5378 0.7480
13196.114144 -2.8805 1.2701
13197.125305 -5.7718 0.8569
13207.117720 4.3347 1.3446
13208.123322 8.6826 1.1997
13238.075000 0.4488 1.1177
13239.137743 4.7305 1.0623
13239.983356 3.7248 1.0509
13241.043070 6.8523 0.7034
13301.800231 5.8442 1.1004
13302.870058 11.5434 1.0594
13303.894282 1.6920 0.8718
13338.766794 10.3218 0.9728
13339.911921 5.3337 1.0051
13368.839549 1.9201 0.9236
13369.760250 2.9115 0.6046
13397.727720 -0.5044 0.9282
13398.748762 7.9218 0.8294
13399.719965 7.9981 0.8511
13400.722338 4.0888 0.9303
13425.749826 1.5571 0.9956
13426.711343 1.1308 0.9198
13427.763414 1.7341 0.9986
13551.131655 0.1543 0.9205
13552.133299 -1.7902 0.9850
13696.900440 -5.7341 1.0210
13723.843958 -4.9887 1.0141
13746.743835 1.4755 0.5480
13747.803715 0.9159 0.9375
13748.728171 -2.2966 0.9150
13749.734456 3.1935 0.8821
13750.716424 1.9810 0.9588
13751.696863 2.5223 0.9091
13752.695995 2.8298 1.0102
13753.694321 2.7341 0.6803
13775.730961 -8.2307 1.0315
13776.722211 -3.9923 0.9291
13926.130456 -9.5344 0.5636
13927.109433 -11.1021 1.0593
13928.116435 -8.9531 0.9818
13933.092130 -5.0851 0.9839
13962.120150 -12.5511 0.9665
13981.981065 -8.1267 1.0270
13983.031655 -4.7627 1.1662
13984.031319 -13.4317 1.0740
13984.981215 -7.9433 0.9975
14023.985822 -8.4543 1.0863
14083.846400 -8.5211 1.1254
14084.801794 -8.2853 1.1667
14085.926840 -10.7708 1.1112
14129.728727 -9.8557 1.1174
14130.808565 -6.8924 1.0496
14131.735139 -9.5523 1.1019
14138.726123 -19.2455 1.0797
14307.135394 -19.3035 1.1076
14314.106944 -10.6002 1.1014
14337.109931 -18.1683 1.1961
14343.989815 -14.3236 1.1816
14396.843993 -8.1080 1.1140
14428.938345 -9.9274 0.6591
14639.129965 -10.0765 1.1747
14675.048241 -16.5316 1.1695
14689.013530 -9.5340 1.2468
14717.956979 -18.2615 1.3374
14725.896910 -6.4570 1.3795
14727.012153 -11.5006 1.2117
14727.955961 -2.4502 1.1250
14838.800382 -0.6436 0.9883
15015.121124 -3.7773 1.0546
15049.114495 -2.2694 1.2116
15133.998578 -7.5107 1.1313
15171.896575 -0.4995 1.2235
15189.809291 -3.7323 1.2005
15196.784565 -1.1088 1.0145
15231.702634 1.9133 1.0408
15255.709429 -0.0267 1.1699
15260.711648 10.2724 1.2097
15406.115969 -2.8208 1.0969
15407.102724 -1.6299 1.0710
15414.080465 -0.5420 1.1696
15426.085724 6.4654 1.0386
15435.115410 3.9615 1.1658
15439.122233 -3.5856 1.1219
15465.062030 6.6551 1.2756
15487.037152 0.0895 1.1080
Monday, November 22, 2010
Wednesday, November 17, 2010
Monday, November 15, 2010
AMOEBA implementation
Here's my code. IDL users, feel free to use this as-is. MATLAB users, feel free to translate it. The point of this assignment is to get the Bayesian periodogram working, not to fiddle with AMOEBA. However, it's a handy tool to have in your belt, so I encourage you to make sure you understand this example of its use.
;===================SINEFIT.PRO======================
function sineorbit, x, per, par
return, par[0] * sin(2*!pi/per * x + par[1]) + par[2]
end
function sinefunc, par
common sinestruct, struct
ymod = sineorbit(struct.x, struct.per, par)
;;; MaxL is just -chi^2 / 2 without the constant. Just use chi as
;;; statistic to be minimized
chi = total((struct.y-ymod)^2/(2.*struct.err)^2)
return, chi
end
function sinefit, x, y, err, period
common sinestruct, struct
struct = {x:x, y:y, err:err, per:period} ;;; create "control" structure
p0 = [(max(y)-min(y))/2., 0., median(y)] ;;; Initital guesses based on data
;;; The scale is the fractional moves in each parameter that AMOEBA will
;;; make while oozing around in maxL (chi^2) space.
scale = [p0[0]*0.05, !pi*0.1, p0[0]*0.05]
;;; Call amoeba and stop after 5% = 0.05 decrease in chi^2 is reached
pbest = amoeba(0.05, function_name='sinefunc', p0=p0, scale=scale)
;;; Calculate best-fitting orbit
ymod = sineorbit(x, period, pbest)
chi = total((struct.y-ymod)^2/(2.*struct.err)^2)
maxl = - total(0.5*alog(2.*!dpi*err^2)) - chi/2.
return, maxl
end
;===================SINEFIT.PRO======================
function sineorbit, x, per, par
return, par[0] * sin(2*!pi/per * x + par[1]) + par[2]
end
function sinefunc, par
common sinestruct, struct
ymod = sineorbit(struct.x, struct.per, par)
;;; MaxL is just -chi^2 / 2 without the constant. Just use chi as
;;; statistic to be minimized
chi = total((struct.y-ymod)^2/(2.*struct.err)^2)
return, chi
end
function sinefit, x, y, err, period
common sinestruct, struct
struct = {x:x, y:y, err:err, per:period} ;;; create "control" structure
p0 = [(max(y)-min(y))/2., 0., median(y)] ;;; Initital guesses based on data
;;; The scale is the fractional moves in each parameter that AMOEBA will
;;; make while oozing around in maxL (chi^2) space.
scale = [p0[0]*0.05, !pi*0.1, p0[0]*0.05]
;;; Call amoeba and stop after 5% = 0.05 decrease in chi^2 is reached
pbest = amoeba(0.05, function_name='sinefunc', p0=p0, scale=scale)
;;; Calculate best-fitting orbit
ymod = sineorbit(x, period, pbest)
chi = total((struct.y-ymod)^2/(2.*struct.err)^2)
maxl = - total(0.5*alog(2.*!dpi*err^2)) - chi/2.
return, maxl
end
We still have class!
We still have 3 good weeks of class, so let's keep learning even while you work on your end-of-term projects. We'll have in-class activities, but optional homework (only do as much as you need to learn).
Sunday, November 14, 2010
Class Activity 8: HD102956 data set
This week we will examine the radial velocity time series of the 1.8 solar-mass star HD102956. The measurements were made using the HIgh-Resolution Echelle Spectrometer, which is mounted on the Keck 1 telescope. The first column is the Julian Day number minus 2.44e6. The second column is the relative velocity of the star in meters per second. The third column is the measurement uncertainty, which does not account for errors in the velocity related to the star moving around (velocity "jitter"). The class activity worksheet (in class Monday) will describe how to analyze these data.
14216.805150 21.5134 1.1715
14429.150243 15.7103 1.4097
14866.114214 47.6038 1.2837
14986.851118 -48.4759 1.2909
15014.771611 69.0507 1.0646
15016.870291 -38.8699 1.1213
15284.916557 -69.2668 1.3374
15285.992750 -4.1502 1.2532
15313.914745 53.7661 1.2567
15314.828012 0.0000 1.1595
15343.795192 -56.4066 1.1648
15344.885379 14.0928 1.3477
15350.785608 -30.2992 1.2856
15351.880460 44.4481 1.2105
15372.803796 41.1830 1.1439
15373.796131 -32.2805 1.2237
15374.752750 -88.3723 1.0914
15376.783324 -18.5026 1.1120
15377.812002 41.1884 1.1311
15378.748741 60.3889 1.1266
15379.786957 10.0510 1.1149
15380.804516 -60.2914 1.1533
15400.780895 -67.3449 1.3536
14216.805150 21.5134 1.1715
14429.150243 15.7103 1.4097
14866.114214 47.6038 1.2837
14986.851118 -48.4759 1.2909
15014.771611 69.0507 1.0646
15016.870291 -38.8699 1.1213
15284.916557 -69.2668 1.3374
15285.992750 -4.1502 1.2532
15313.914745 53.7661 1.2567
15314.828012 0.0000 1.1595
15343.795192 -56.4066 1.1648
15344.885379 14.0928 1.3477
15350.785608 -30.2992 1.2856
15351.880460 44.4481 1.2105
15372.803796 41.1830 1.1439
15373.796131 -32.2805 1.2237
15374.752750 -88.3723 1.0914
15376.783324 -18.5026 1.1120
15377.812002 41.1884 1.1311
15378.748741 60.3889 1.1266
15379.786957 10.0510 1.1149
15380.804516 -60.2914 1.1533
15400.780895 -67.3449 1.3536
Thursday, November 11, 2010
No Final Exam, but...
Clear your calendars for Wednesday, December 8 for the annual Ay117 Statistics in Astronomy Conference. This year the conference is in sunny Pasadena, CA, at the world-famous Cahill Center for Astronomy & Astrophysics.
Conference registration is free, yet mandatory for Ay117 students and TAs.
We will have seven 10-minute oral presentations (+5 minutes for Q&A), and a poster session. Sign up now for a poster or oral presentation slot (posters are graded much more harshly than oral presentations). Abstract submission open through November 17.
The session chair is Professor John Johnson who will give a talk entitled "Bayesian Statistics: There's no magic." Contact him via email for abstract submission (plain text only).
Conference registration is free, yet mandatory for Ay117 students and TAs.
We will have seven 10-minute oral presentations (+5 minutes for Q&A), and a poster session. Sign up now for a poster or oral presentation slot (posters are graded much more harshly than oral presentations). Abstract submission open through November 17.
The session chair is Professor John Johnson who will give a talk entitled "Bayesian Statistics: There's no magic." Contact him via email for abstract submission (plain text only).
Class Activity 7 finally posted
Nothing has changed since it was assigned in class. Here's the official class activity worksheet.
http://www.astro.caltech.edu/~johnjohn/astrostats/homework/CA7.pdf
http://www.astro.caltech.edu/~johnjohn/astrostats/homework/CA7.pdf
More reading materials on nested sampling
http://www.inference.phy.cam.ac.uk/bayesys/box/nested.pdf'
Also, check out David MacKay's website:
http://www.inference.phy.cam.ac.uk/bayesys/
Also, check out David MacKay's website:
http://www.inference.phy.cam.ac.uk/bayesys/
Sunday, November 7, 2010
Good, but not required, reading
Andrew Gelman's discussion of Harold Jeffreys' Theory of Probability:
http://www.stat.columbia.edu/~gelman/research/published/jeffreys.pdf
http://www.stat.columbia.edu/~gelman/research/published/jeffreys.pdf
Friday, November 5, 2010
On Jeffreys priors: scale vs. location
I have to admit that even after reviewing many of your write-ups, I remained a bit uncertain about when I should and should not use a Jeffreys prior for a parameter (i.e. p(x) ~ 1/x). It is clear to me that the determination of the parameters describing e.g. a normal distribution involves a scale parameter sigma and a location parameter mu, and the uninformed prior for the scale parameter is p(sigma) ~ 1/sigma, and for the location parameter it's a uniform prior p(mu) ~ 1.
But what happens when mu itself is parametrized, such as mu(x; A,B) = Ax + B? Is A a scale parameter while B is a location parameter? Before class Wednesday, I would have answered yes. After class, I was seriously doubting myself.
Unfortunately, after several hours of paging through books and searching the web I couldn't find a definitive answer to this question. The difficulty is that all of the examples of Jeffreys priors I found use a normal distribution as an example, or some other simple distribution (Poisson or binomial). But I could find precisely nothing about the similarly simple and far more ubiquitous problem of an uninformative prior on the slope of a linear fit! What's up with that? So I decided to do what I should have done at first: query my intuition rather than statistics papers.
What is it that we're trying to do with a Jeffreys prior? We are simply trying to properly encapsulate the notion that we have no prior knowledge of a parameter. Let's work with the example of a time series of flux values with normal errors, for which we suspect the flux level is a linear function of time:
F(t; A, B) = A + Bt.
Is a uniform prior appropriate for the slope B? A uniform prior would say that the slope is equally likely to lie between the interval 1 and 2 as it would between 1,000,000 and 1,000,001. Hmmm, this doesn't feel right. The former increase in the slope is a 100% change, while the latter is an increase of only 1 part in a million. By making the statement p(B) ~ const, we'd be saying that a slope between 1,000,001 and 1,000,003 is twice as likely as a slope between 1 and 2! If we were to make this statement, then we'd clearly have some pretty strong prior information. We'd be far from uninformed.
So in the case of a slope, I'd argue that we have a clear-cut reason to rely on a Jeffreys prior of the form p(B) ~ 1/B, which is the same as saying p(lnB) ~ const., or every log interval is equally likely. This feels much better.
Keep in mind that when we talk about prior information, we are talking about knowledge we had about the problem at hand before we were handed the data. For the case of a flux level that is linear in time, what do we know about the offset A? Well, we certainly do not know its numerical value without peeking at the data. But we do know that the shape of the function A + Bt is going to look the same for any choice of the offset, A. The offset can be 1 or 1000 and it won't change the fact that we'll end end up with B more flux in each unit time interval.
Thus, a uniform prior is appropriate for the offset parameter A: the offset interval 1 to 2 is just as likely as the interval 1,000,000 to 1,000,001. The offset is a location parameter, and location parameters call for a uniform prior.
Another way to think about all this is to consider two choices of units. If I change the y-axis from ergs/s/cm^2 to horsepower per acre, the offset just translates from one numerical value to another. However, for a fixed offset A, the units are crucial for a slope value B. If I tell you that the slope is B=1, it matters a lot whether we are talking about deaths per year or deaths per second! As a result, before we have the data sitting on our hard drive we certainly don't want to pick a prior for the slope (rate) that expresses a preference in one choice of units over another.
Using this reasoning p(A) ~ const. and p(B) ~ 1/B feel about right to me for the simple case of a linear fit with normally-distributed errors, or for any other model parametrization for that matter (see Gregory 2005 for the case of a Keplerian orbit, in which the Doppler amplitude K is given a Jeffreys' prior while the phase information is uniform). I'm not much interested in the square root of the determinant of the Fischer information. My intuition tells me that location parameters get uniform priors, and scale parameters (in which the units on the x or y axis matter) get p(B) ~ 1/B priors.
But don't take my word for it, query your own intuition.
But what happens when mu itself is parametrized, such as mu(x; A,B) = Ax + B? Is A a scale parameter while B is a location parameter? Before class Wednesday, I would have answered yes. After class, I was seriously doubting myself.
Unfortunately, after several hours of paging through books and searching the web I couldn't find a definitive answer to this question. The difficulty is that all of the examples of Jeffreys priors I found use a normal distribution as an example, or some other simple distribution (Poisson or binomial). But I could find precisely nothing about the similarly simple and far more ubiquitous problem of an uninformative prior on the slope of a linear fit! What's up with that? So I decided to do what I should have done at first: query my intuition rather than statistics papers.
What is it that we're trying to do with a Jeffreys prior? We are simply trying to properly encapsulate the notion that we have no prior knowledge of a parameter. Let's work with the example of a time series of flux values with normal errors, for which we suspect the flux level is a linear function of time:
F(t; A, B) = A + Bt.
Is a uniform prior appropriate for the slope B? A uniform prior would say that the slope is equally likely to lie between the interval 1 and 2 as it would between 1,000,000 and 1,000,001. Hmmm, this doesn't feel right. The former increase in the slope is a 100% change, while the latter is an increase of only 1 part in a million. By making the statement p(B) ~ const, we'd be saying that a slope between 1,000,001 and 1,000,003 is twice as likely as a slope between 1 and 2! If we were to make this statement, then we'd clearly have some pretty strong prior information. We'd be far from uninformed.
So in the case of a slope, I'd argue that we have a clear-cut reason to rely on a Jeffreys prior of the form p(B) ~ 1/B, which is the same as saying p(lnB) ~ const., or every log interval is equally likely. This feels much better.
Keep in mind that when we talk about prior information, we are talking about knowledge we had about the problem at hand before we were handed the data. For the case of a flux level that is linear in time, what do we know about the offset A? Well, we certainly do not know its numerical value without peeking at the data. But we do know that the shape of the function A + Bt is going to look the same for any choice of the offset, A. The offset can be 1 or 1000 and it won't change the fact that we'll end end up with B more flux in each unit time interval.
Thus, a uniform prior is appropriate for the offset parameter A: the offset interval 1 to 2 is just as likely as the interval 1,000,000 to 1,000,001. The offset is a location parameter, and location parameters call for a uniform prior.
Another way to think about all this is to consider two choices of units. If I change the y-axis from ergs/s/cm^2 to horsepower per acre, the offset just translates from one numerical value to another. However, for a fixed offset A, the units are crucial for a slope value B. If I tell you that the slope is B=1, it matters a lot whether we are talking about deaths per year or deaths per second! As a result, before we have the data sitting on our hard drive we certainly don't want to pick a prior for the slope (rate) that expresses a preference in one choice of units over another.
Using this reasoning p(A) ~ const. and p(B) ~ 1/B feel about right to me for the simple case of a linear fit with normally-distributed errors, or for any other model parametrization for that matter (see Gregory 2005 for the case of a Keplerian orbit, in which the Doppler amplitude K is given a Jeffreys' prior while the phase information is uniform). I'm not much interested in the square root of the determinant of the Fischer information. My intuition tells me that location parameters get uniform priors, and scale parameters (in which the units on the x or y axis matter) get p(B) ~ 1/B priors.
But don't take my word for it, query your own intuition.
Thursday, November 4, 2010
Wednesday, November 3, 2010
Tuning the jump distribution
I mentioned in class that the optimal acceptance rate for MCMC jump proposals is 30-40% based on what others had recommended to me when teaching me the Way of the Markov Chain. It turns out, however, that for higher-order problems the optimal rate is 0.234 (Corolary 1.2 of Roberts, Gelman and Gilks 1997). If you want to know exactly why this is so, you should read the full paper. It's full of lemma-ny goodness! According to the abstract:
Supposedly ~50% is optimal for problems with 1 or 2 parameters (Section 4.2 of Gregory 2005). But if you have only 1 or 2 parameters, I don't know why you'd want to do MCMC, unless there is a nonanalytic likelihood to evaluate through a simulation or something. Gregory cites Roberts et al. above, but I didn't see anything about 50% for simpler problems. If you decide to read Roberts et al., let me know if I missed the note about 50%.
That said, 30-40% has treated me well over the years. Let's say you should aim for 25-30% acceptance of your jump proposals for the full line profile fit (Problem 3 of CA6).
The limiting diffusion approximation admits a straightforward effi-No one breaks it down for the layperson like a statistician!
ciency maximization problem, and the resulting asymptotically optimal
policy is related to the asymptotic acceptance rate of proposed moves for
the algorithm. The asymptotically optimal acceptance rate is 0.234 under
quite general conditions.
Supposedly ~50% is optimal for problems with 1 or 2 parameters (Section 4.2 of Gregory 2005). But if you have only 1 or 2 parameters, I don't know why you'd want to do MCMC, unless there is a nonanalytic likelihood to evaluate through a simulation or something. Gregory cites Roberts et al. above, but I didn't see anything about 50% for simpler problems. If you decide to read Roberts et al., let me know if I missed the note about 50%.
That said, 30-40% has treated me well over the years. Let's say you should aim for 25-30% acceptance of your jump proposals for the full line profile fit (Problem 3 of CA6).
Tuesday, November 2, 2010
CA6: MCMC results for linear response function
Here are the marginalized posterior pdfs for the linear coefficients describing response function in linear.txt:
It took me about a 1.5 hours to write the code to first do a grid search (red line, peak set to unity), and then write the MCMC algorithm (black histogram). I imagine the Gaussian line profile fit due Monday should take you about 3 hours, most of it debugging. Such is research.
Some tips for writing MCMC code:
It took me about a 1.5 hours to write the code to first do a grid search (red line, peak set to unity), and then write the MCMC algorithm (black histogram). I imagine the Gaussian line profile fit due Monday should take you about 3 hours, most of it debugging. Such is research.
Some tips for writing MCMC code:
- For the linear fit, do a grid analysis first. This will help you get a sense for the probability space. If you have a handy amoeba or downhill simplex code, you can do a maximum likelihood search for the best-fitting line profile parameters for problem 3, and then use those pars as the initial guesses for your MCMC code.
- Work with logarithmic probabilities!
- Make sure each step of your code makes logical sense. Did log(L) increase when you moved closer to the best values?
- You will start each chain link from some set of parameters with some starting value of log(L). If you do not accept the step, make sure you go back to the starting values of the parameters and log(L). Otherwise you random-walk yourself into oblivion (this is what had me stuck for 30 minutes this morning).
- 10^7 chain links required only 2 minutes on a 2.9 GHz Intel CPU for the linear problem. Vectorize your code!
- In IDL, use a WHILE or REPEAT-UNTIL loop. FOR loop control variables can't go high enough for most MCMC applications. UPDATE: Prof. Fitzgerald noted that you can use Long variable types as loop control variables: FOR i = 0L, long(N-1) DO ...
Monday, November 1, 2010
Good MCMC reference
http://web.mit.edu/~wingated/www/introductions/mcmc-gibbs-intro.pdf
This is just for your future reference. This is a handy guide to Monte Carlo integration (useful for the next time you have to evaluate a heinous multidimensional integral), with a second half on Markov Chains and Gibbs sampling.
This is just for your future reference. This is a handy guide to Monte Carlo integration (useful for the next time you have to evaluate a heinous multidimensional integral), with a second half on Markov Chains and Gibbs sampling.
Reading Assignment: Jeffreys' Prior
This week's reading assignment is to find a resource that helps you understand the concept of a Jeffreys' prior. Due Wednesday is a short write-up about this topic at whatever level and length will help you understand the concept in the future, in the same spirit as the first HW assignment on probability distributions.
One good resource is Chapter 5.1 of Sivia (5.1.2 specifically).
We'll continue working on Class Activity 6 in class Wednesday. The figures and captions will be due next Monday.
One good resource is Chapter 5.1 of Sivia (5.1.2 specifically).
We'll continue working on Class Activity 6 in class Wednesday. The figures and captions will be due next Monday.
Subscribe to:
Posts (Atom)