These RV measurements are in the same format as last week's HD102964 data set (t, v, sigma_v).
10365.979236 -5.4720 0.9872
10715.080012 -7.4086 1.0981
10786.759306 0.4987 1.2969
10806.886343 -5.0791 0.9564
11010.115556 3.5447 1.0068
11012.085440 4.8886 0.9584
11013.103947 3.1676 1.1267
11014.105914 4.3955 1.0783
11070.992361 1.8550 1.3133
11170.793657 4.4971 1.1046
11374.123206 0.0000 1.0404
11412.084051 4.0750 1.2597
11585.736539 16.8490 1.0630
11757.136620 -4.7007 1.0475
11883.801736 2.1463 1.3573
12098.097419 -10.4014 1.1261
12134.022928 -15.5328 1.2441
12162.926852 -5.7099 1.2710
12188.010926 -5.7224 1.1966
12218.947315 -12.4545 1.2057
12235.761655 -5.3930 1.2138
12489.047558 -12.2946 1.0018
12516.010463 -15.1926 1.2005
12535.910926 -2.3131 1.3004
12572.946597 -5.6120 1.3796
12600.927569 -8.5339 1.3608
12651.762963 -6.4725 1.0555
12833.093715 -0.4214 1.1765
12834.127720 5.3296 1.2034
12835.121528 0.2185 1.3992
12836.075405 0.9086 1.1630
12850.087384 -2.7960 1.0992
12851.131667 -1.2862 1.1301
12854.031620 0.5737 1.2861
12855.114792 -0.6867 1.3715
12856.061181 -1.0921 1.3094
12897.920671 0.0466 1.2069
12899.035324 -5.1169 1.2442
12924.926863 5.1777 1.2645
12926.058322 1.1283 1.2763
12987.739282 12.2442 1.3284
12988.761852 7.2284 1.2760
12989.781250 5.5562 1.3132
13014.741790 0.9943 0.8870
13015.745883 2.7821 0.7728
13016.749759 6.5555 0.7824
13017.765800 0.5844 0.7813
13072.716047 9.2352 1.0131
13180.129491 8.6986 1.1281
13181.124873 13.2785 1.3097
13182.120243 7.6246 1.1346
13190.126876 9.3208 0.6338
13191.126546 9.5378 0.7480
13196.114144 -2.8805 1.2701
13197.125305 -5.7718 0.8569
13207.117720 4.3347 1.3446
13208.123322 8.6826 1.1997
13238.075000 0.4488 1.1177
13239.137743 4.7305 1.0623
13239.983356 3.7248 1.0509
13241.043070 6.8523 0.7034
13301.800231 5.8442 1.1004
13302.870058 11.5434 1.0594
13303.894282 1.6920 0.8718
13338.766794 10.3218 0.9728
13339.911921 5.3337 1.0051
13368.839549 1.9201 0.9236
13369.760250 2.9115 0.6046
13397.727720 -0.5044 0.9282
13398.748762 7.9218 0.8294
13399.719965 7.9981 0.8511
13400.722338 4.0888 0.9303
13425.749826 1.5571 0.9956
13426.711343 1.1308 0.9198
13427.763414 1.7341 0.9986
13551.131655 0.1543 0.9205
13552.133299 -1.7902 0.9850
13696.900440 -5.7341 1.0210
13723.843958 -4.9887 1.0141
13746.743835 1.4755 0.5480
13747.803715 0.9159 0.9375
13748.728171 -2.2966 0.9150
13749.734456 3.1935 0.8821
13750.716424 1.9810 0.9588
13751.696863 2.5223 0.9091
13752.695995 2.8298 1.0102
13753.694321 2.7341 0.6803
13775.730961 -8.2307 1.0315
13776.722211 -3.9923 0.9291
13926.130456 -9.5344 0.5636
13927.109433 -11.1021 1.0593
13928.116435 -8.9531 0.9818
13933.092130 -5.0851 0.9839
13962.120150 -12.5511 0.9665
13981.981065 -8.1267 1.0270
13983.031655 -4.7627 1.1662
13984.031319 -13.4317 1.0740
13984.981215 -7.9433 0.9975
14023.985822 -8.4543 1.0863
14083.846400 -8.5211 1.1254
14084.801794 -8.2853 1.1667
14085.926840 -10.7708 1.1112
14129.728727 -9.8557 1.1174
14130.808565 -6.8924 1.0496
14131.735139 -9.5523 1.1019
14138.726123 -19.2455 1.0797
14307.135394 -19.3035 1.1076
14314.106944 -10.6002 1.1014
14337.109931 -18.1683 1.1961
14343.989815 -14.3236 1.1816
14396.843993 -8.1080 1.1140
14428.938345 -9.9274 0.6591
14639.129965 -10.0765 1.1747
14675.048241 -16.5316 1.1695
14689.013530 -9.5340 1.2468
14717.956979 -18.2615 1.3374
14725.896910 -6.4570 1.3795
14727.012153 -11.5006 1.2117
14727.955961 -2.4502 1.1250
14838.800382 -0.6436 0.9883
15015.121124 -3.7773 1.0546
15049.114495 -2.2694 1.2116
15133.998578 -7.5107 1.1313
15171.896575 -0.4995 1.2235
15189.809291 -3.7323 1.2005
15196.784565 -1.1088 1.0145
15231.702634 1.9133 1.0408
15255.709429 -0.0267 1.1699
15260.711648 10.2724 1.2097
15406.115969 -2.8208 1.0969
15407.102724 -1.6299 1.0710
15414.080465 -0.5420 1.1696
15426.085724 6.4654 1.0386
15435.115410 3.9615 1.1658
15439.122233 -3.5856 1.1219
15465.062030 6.6551 1.2756
15487.037152 0.0895 1.1080
Monday, November 22, 2010
Wednesday, November 17, 2010
Monday, November 15, 2010
AMOEBA implementation
Here's my code. IDL users, feel free to use this as-is. MATLAB users, feel free to translate it. The point of this assignment is to get the Bayesian periodogram working, not to fiddle with AMOEBA. However, it's a handy tool to have in your belt, so I encourage you to make sure you understand this example of its use.
;===================SINEFIT.PRO======================
function sineorbit, x, per, par
return, par[0] * sin(2*!pi/per * x + par[1]) + par[2]
end
function sinefunc, par
common sinestruct, struct
ymod = sineorbit(struct.x, struct.per, par)
;;; MaxL is just -chi^2 / 2 without the constant. Just use chi as
;;; statistic to be minimized
chi = total((struct.y-ymod)^2/(2.*struct.err)^2)
return, chi
end
function sinefit, x, y, err, period
common sinestruct, struct
struct = {x:x, y:y, err:err, per:period} ;;; create "control" structure
p0 = [(max(y)-min(y))/2., 0., median(y)] ;;; Initital guesses based on data
;;; The scale is the fractional moves in each parameter that AMOEBA will
;;; make while oozing around in maxL (chi^2) space.
scale = [p0[0]*0.05, !pi*0.1, p0[0]*0.05]
;;; Call amoeba and stop after 5% = 0.05 decrease in chi^2 is reached
pbest = amoeba(0.05, function_name='sinefunc', p0=p0, scale=scale)
;;; Calculate best-fitting orbit
ymod = sineorbit(x, period, pbest)
chi = total((struct.y-ymod)^2/(2.*struct.err)^2)
maxl = - total(0.5*alog(2.*!dpi*err^2)) - chi/2.
return, maxl
end
;===================SINEFIT.PRO======================
function sineorbit, x, per, par
return, par[0] * sin(2*!pi/per * x + par[1]) + par[2]
end
function sinefunc, par
common sinestruct, struct
ymod = sineorbit(struct.x, struct.per, par)
;;; MaxL is just -chi^2 / 2 without the constant. Just use chi as
;;; statistic to be minimized
chi = total((struct.y-ymod)^2/(2.*struct.err)^2)
return, chi
end
function sinefit, x, y, err, period
common sinestruct, struct
struct = {x:x, y:y, err:err, per:period} ;;; create "control" structure
p0 = [(max(y)-min(y))/2., 0., median(y)] ;;; Initital guesses based on data
;;; The scale is the fractional moves in each parameter that AMOEBA will
;;; make while oozing around in maxL (chi^2) space.
scale = [p0[0]*0.05, !pi*0.1, p0[0]*0.05]
;;; Call amoeba and stop after 5% = 0.05 decrease in chi^2 is reached
pbest = amoeba(0.05, function_name='sinefunc', p0=p0, scale=scale)
;;; Calculate best-fitting orbit
ymod = sineorbit(x, period, pbest)
chi = total((struct.y-ymod)^2/(2.*struct.err)^2)
maxl = - total(0.5*alog(2.*!dpi*err^2)) - chi/2.
return, maxl
end
We still have class!
We still have 3 good weeks of class, so let's keep learning even while you work on your end-of-term projects. We'll have in-class activities, but optional homework (only do as much as you need to learn).
Sunday, November 14, 2010
Class Activity 8: HD102956 data set
This week we will examine the radial velocity time series of the 1.8 solar-mass star HD102956. The measurements were made using the HIgh-Resolution Echelle Spectrometer, which is mounted on the Keck 1 telescope. The first column is the Julian Day number minus 2.44e6. The second column is the relative velocity of the star in meters per second. The third column is the measurement uncertainty, which does not account for errors in the velocity related to the star moving around (velocity "jitter"). The class activity worksheet (in class Monday) will describe how to analyze these data.
14216.805150 21.5134 1.1715
14429.150243 15.7103 1.4097
14866.114214 47.6038 1.2837
14986.851118 -48.4759 1.2909
15014.771611 69.0507 1.0646
15016.870291 -38.8699 1.1213
15284.916557 -69.2668 1.3374
15285.992750 -4.1502 1.2532
15313.914745 53.7661 1.2567
15314.828012 0.0000 1.1595
15343.795192 -56.4066 1.1648
15344.885379 14.0928 1.3477
15350.785608 -30.2992 1.2856
15351.880460 44.4481 1.2105
15372.803796 41.1830 1.1439
15373.796131 -32.2805 1.2237
15374.752750 -88.3723 1.0914
15376.783324 -18.5026 1.1120
15377.812002 41.1884 1.1311
15378.748741 60.3889 1.1266
15379.786957 10.0510 1.1149
15380.804516 -60.2914 1.1533
15400.780895 -67.3449 1.3536
14216.805150 21.5134 1.1715
14429.150243 15.7103 1.4097
14866.114214 47.6038 1.2837
14986.851118 -48.4759 1.2909
15014.771611 69.0507 1.0646
15016.870291 -38.8699 1.1213
15284.916557 -69.2668 1.3374
15285.992750 -4.1502 1.2532
15313.914745 53.7661 1.2567
15314.828012 0.0000 1.1595
15343.795192 -56.4066 1.1648
15344.885379 14.0928 1.3477
15350.785608 -30.2992 1.2856
15351.880460 44.4481 1.2105
15372.803796 41.1830 1.1439
15373.796131 -32.2805 1.2237
15374.752750 -88.3723 1.0914
15376.783324 -18.5026 1.1120
15377.812002 41.1884 1.1311
15378.748741 60.3889 1.1266
15379.786957 10.0510 1.1149
15380.804516 -60.2914 1.1533
15400.780895 -67.3449 1.3536
Thursday, November 11, 2010
No Final Exam, but...
Clear your calendars for Wednesday, December 8 for the annual Ay117 Statistics in Astronomy Conference. This year the conference is in sunny Pasadena, CA, at the world-famous Cahill Center for Astronomy & Astrophysics.
Conference registration is free, yet mandatory for Ay117 students and TAs.
We will have seven 10-minute oral presentations (+5 minutes for Q&A), and a poster session. Sign up now for a poster or oral presentation slot (posters are graded much more harshly than oral presentations). Abstract submission open through November 17.
The session chair is Professor John Johnson who will give a talk entitled "Bayesian Statistics: There's no magic." Contact him via email for abstract submission (plain text only).
Conference registration is free, yet mandatory for Ay117 students and TAs.
We will have seven 10-minute oral presentations (+5 minutes for Q&A), and a poster session. Sign up now for a poster or oral presentation slot (posters are graded much more harshly than oral presentations). Abstract submission open through November 17.
The session chair is Professor John Johnson who will give a talk entitled "Bayesian Statistics: There's no magic." Contact him via email for abstract submission (plain text only).
Class Activity 7 finally posted
Nothing has changed since it was assigned in class. Here's the official class activity worksheet.
http://www.astro.caltech.edu/~johnjohn/astrostats/homework/CA7.pdf
http://www.astro.caltech.edu/~johnjohn/astrostats/homework/CA7.pdf
More reading materials on nested sampling
http://www.inference.phy.cam.ac.uk/bayesys/box/nested.pdf'
Also, check out David MacKay's website:
http://www.inference.phy.cam.ac.uk/bayesys/
Also, check out David MacKay's website:
http://www.inference.phy.cam.ac.uk/bayesys/
Sunday, November 7, 2010
Good, but not required, reading
Andrew Gelman's discussion of Harold Jeffreys' Theory of Probability:
http://www.stat.columbia.edu/~gelman/research/published/jeffreys.pdf
http://www.stat.columbia.edu/~gelman/research/published/jeffreys.pdf
Friday, November 5, 2010
On Jeffreys priors: scale vs. location
I have to admit that even after reviewing many of your write-ups, I remained a bit uncertain about when I should and should not use a Jeffreys prior for a parameter (i.e. p(x) ~ 1/x). It is clear to me that the determination of the parameters describing e.g. a normal distribution involves a scale parameter sigma and a location parameter mu, and the uninformed prior for the scale parameter is p(sigma) ~ 1/sigma, and for the location parameter it's a uniform prior p(mu) ~ 1.
But what happens when mu itself is parametrized, such as mu(x; A,B) = Ax + B? Is A a scale parameter while B is a location parameter? Before class Wednesday, I would have answered yes. After class, I was seriously doubting myself.
Unfortunately, after several hours of paging through books and searching the web I couldn't find a definitive answer to this question. The difficulty is that all of the examples of Jeffreys priors I found use a normal distribution as an example, or some other simple distribution (Poisson or binomial). But I could find precisely nothing about the similarly simple and far more ubiquitous problem of an uninformative prior on the slope of a linear fit! What's up with that? So I decided to do what I should have done at first: query my intuition rather than statistics papers.
What is it that we're trying to do with a Jeffreys prior? We are simply trying to properly encapsulate the notion that we have no prior knowledge of a parameter. Let's work with the example of a time series of flux values with normal errors, for which we suspect the flux level is a linear function of time:
F(t; A, B) = A + Bt.
Is a uniform prior appropriate for the slope B? A uniform prior would say that the slope is equally likely to lie between the interval 1 and 2 as it would between 1,000,000 and 1,000,001. Hmmm, this doesn't feel right. The former increase in the slope is a 100% change, while the latter is an increase of only 1 part in a million. By making the statement p(B) ~ const, we'd be saying that a slope between 1,000,001 and 1,000,003 is twice as likely as a slope between 1 and 2! If we were to make this statement, then we'd clearly have some pretty strong prior information. We'd be far from uninformed.
So in the case of a slope, I'd argue that we have a clear-cut reason to rely on a Jeffreys prior of the form p(B) ~ 1/B, which is the same as saying p(lnB) ~ const., or every log interval is equally likely. This feels much better.
Keep in mind that when we talk about prior information, we are talking about knowledge we had about the problem at hand before we were handed the data. For the case of a flux level that is linear in time, what do we know about the offset A? Well, we certainly do not know its numerical value without peeking at the data. But we do know that the shape of the function A + Bt is going to look the same for any choice of the offset, A. The offset can be 1 or 1000 and it won't change the fact that we'll end end up with B more flux in each unit time interval.
Thus, a uniform prior is appropriate for the offset parameter A: the offset interval 1 to 2 is just as likely as the interval 1,000,000 to 1,000,001. The offset is a location parameter, and location parameters call for a uniform prior.
Another way to think about all this is to consider two choices of units. If I change the y-axis from ergs/s/cm^2 to horsepower per acre, the offset just translates from one numerical value to another. However, for a fixed offset A, the units are crucial for a slope value B. If I tell you that the slope is B=1, it matters a lot whether we are talking about deaths per year or deaths per second! As a result, before we have the data sitting on our hard drive we certainly don't want to pick a prior for the slope (rate) that expresses a preference in one choice of units over another.
Using this reasoning p(A) ~ const. and p(B) ~ 1/B feel about right to me for the simple case of a linear fit with normally-distributed errors, or for any other model parametrization for that matter (see Gregory 2005 for the case of a Keplerian orbit, in which the Doppler amplitude K is given a Jeffreys' prior while the phase information is uniform). I'm not much interested in the square root of the determinant of the Fischer information. My intuition tells me that location parameters get uniform priors, and scale parameters (in which the units on the x or y axis matter) get p(B) ~ 1/B priors.
But don't take my word for it, query your own intuition.
But what happens when mu itself is parametrized, such as mu(x; A,B) = Ax + B? Is A a scale parameter while B is a location parameter? Before class Wednesday, I would have answered yes. After class, I was seriously doubting myself.
Unfortunately, after several hours of paging through books and searching the web I couldn't find a definitive answer to this question. The difficulty is that all of the examples of Jeffreys priors I found use a normal distribution as an example, or some other simple distribution (Poisson or binomial). But I could find precisely nothing about the similarly simple and far more ubiquitous problem of an uninformative prior on the slope of a linear fit! What's up with that? So I decided to do what I should have done at first: query my intuition rather than statistics papers.
What is it that we're trying to do with a Jeffreys prior? We are simply trying to properly encapsulate the notion that we have no prior knowledge of a parameter. Let's work with the example of a time series of flux values with normal errors, for which we suspect the flux level is a linear function of time:
F(t; A, B) = A + Bt.
Is a uniform prior appropriate for the slope B? A uniform prior would say that the slope is equally likely to lie between the interval 1 and 2 as it would between 1,000,000 and 1,000,001. Hmmm, this doesn't feel right. The former increase in the slope is a 100% change, while the latter is an increase of only 1 part in a million. By making the statement p(B) ~ const, we'd be saying that a slope between 1,000,001 and 1,000,003 is twice as likely as a slope between 1 and 2! If we were to make this statement, then we'd clearly have some pretty strong prior information. We'd be far from uninformed.
So in the case of a slope, I'd argue that we have a clear-cut reason to rely on a Jeffreys prior of the form p(B) ~ 1/B, which is the same as saying p(lnB) ~ const., or every log interval is equally likely. This feels much better.
Keep in mind that when we talk about prior information, we are talking about knowledge we had about the problem at hand before we were handed the data. For the case of a flux level that is linear in time, what do we know about the offset A? Well, we certainly do not know its numerical value without peeking at the data. But we do know that the shape of the function A + Bt is going to look the same for any choice of the offset, A. The offset can be 1 or 1000 and it won't change the fact that we'll end end up with B more flux in each unit time interval.
Thus, a uniform prior is appropriate for the offset parameter A: the offset interval 1 to 2 is just as likely as the interval 1,000,000 to 1,000,001. The offset is a location parameter, and location parameters call for a uniform prior.
Another way to think about all this is to consider two choices of units. If I change the y-axis from ergs/s/cm^2 to horsepower per acre, the offset just translates from one numerical value to another. However, for a fixed offset A, the units are crucial for a slope value B. If I tell you that the slope is B=1, it matters a lot whether we are talking about deaths per year or deaths per second! As a result, before we have the data sitting on our hard drive we certainly don't want to pick a prior for the slope (rate) that expresses a preference in one choice of units over another.
Using this reasoning p(A) ~ const. and p(B) ~ 1/B feel about right to me for the simple case of a linear fit with normally-distributed errors, or for any other model parametrization for that matter (see Gregory 2005 for the case of a Keplerian orbit, in which the Doppler amplitude K is given a Jeffreys' prior while the phase information is uniform). I'm not much interested in the square root of the determinant of the Fischer information. My intuition tells me that location parameters get uniform priors, and scale parameters (in which the units on the x or y axis matter) get p(B) ~ 1/B priors.
But don't take my word for it, query your own intuition.
Thursday, November 4, 2010
Wednesday, November 3, 2010
Tuning the jump distribution
I mentioned in class that the optimal acceptance rate for MCMC jump proposals is 30-40% based on what others had recommended to me when teaching me the Way of the Markov Chain. It turns out, however, that for higher-order problems the optimal rate is 0.234 (Corolary 1.2 of Roberts, Gelman and Gilks 1997). If you want to know exactly why this is so, you should read the full paper. It's full of lemma-ny goodness! According to the abstract:
Supposedly ~50% is optimal for problems with 1 or 2 parameters (Section 4.2 of Gregory 2005). But if you have only 1 or 2 parameters, I don't know why you'd want to do MCMC, unless there is a nonanalytic likelihood to evaluate through a simulation or something. Gregory cites Roberts et al. above, but I didn't see anything about 50% for simpler problems. If you decide to read Roberts et al., let me know if I missed the note about 50%.
That said, 30-40% has treated me well over the years. Let's say you should aim for 25-30% acceptance of your jump proposals for the full line profile fit (Problem 3 of CA6).
The limiting diffusion approximation admits a straightforward effi-No one breaks it down for the layperson like a statistician!
ciency maximization problem, and the resulting asymptotically optimal
policy is related to the asymptotic acceptance rate of proposed moves for
the algorithm. The asymptotically optimal acceptance rate is 0.234 under
quite general conditions.
Supposedly ~50% is optimal for problems with 1 or 2 parameters (Section 4.2 of Gregory 2005). But if you have only 1 or 2 parameters, I don't know why you'd want to do MCMC, unless there is a nonanalytic likelihood to evaluate through a simulation or something. Gregory cites Roberts et al. above, but I didn't see anything about 50% for simpler problems. If you decide to read Roberts et al., let me know if I missed the note about 50%.
That said, 30-40% has treated me well over the years. Let's say you should aim for 25-30% acceptance of your jump proposals for the full line profile fit (Problem 3 of CA6).
Tuesday, November 2, 2010
CA6: MCMC results for linear response function
Here are the marginalized posterior pdfs for the linear coefficients describing response function in linear.txt:
It took me about a 1.5 hours to write the code to first do a grid search (red line, peak set to unity), and then write the MCMC algorithm (black histogram). I imagine the Gaussian line profile fit due Monday should take you about 3 hours, most of it debugging. Such is research.
Some tips for writing MCMC code:
It took me about a 1.5 hours to write the code to first do a grid search (red line, peak set to unity), and then write the MCMC algorithm (black histogram). I imagine the Gaussian line profile fit due Monday should take you about 3 hours, most of it debugging. Such is research.
Some tips for writing MCMC code:
- For the linear fit, do a grid analysis first. This will help you get a sense for the probability space. If you have a handy amoeba or downhill simplex code, you can do a maximum likelihood search for the best-fitting line profile parameters for problem 3, and then use those pars as the initial guesses for your MCMC code.
- Work with logarithmic probabilities!
- Make sure each step of your code makes logical sense. Did log(L) increase when you moved closer to the best values?
- You will start each chain link from some set of parameters with some starting value of log(L). If you do not accept the step, make sure you go back to the starting values of the parameters and log(L). Otherwise you random-walk yourself into oblivion (this is what had me stuck for 30 minutes this morning).
- 10^7 chain links required only 2 minutes on a 2.9 GHz Intel CPU for the linear problem. Vectorize your code!
- In IDL, use a WHILE or REPEAT-UNTIL loop. FOR loop control variables can't go high enough for most MCMC applications. UPDATE: Prof. Fitzgerald noted that you can use Long variable types as loop control variables: FOR i = 0L, long(N-1) DO ...
Monday, November 1, 2010
Good MCMC reference
http://web.mit.edu/~wingated/www/introductions/mcmc-gibbs-intro.pdf
This is just for your future reference. This is a handy guide to Monte Carlo integration (useful for the next time you have to evaluate a heinous multidimensional integral), with a second half on Markov Chains and Gibbs sampling.
This is just for your future reference. This is a handy guide to Monte Carlo integration (useful for the next time you have to evaluate a heinous multidimensional integral), with a second half on Markov Chains and Gibbs sampling.
Reading Assignment: Jeffreys' Prior
This week's reading assignment is to find a resource that helps you understand the concept of a Jeffreys' prior. Due Wednesday is a short write-up about this topic at whatever level and length will help you understand the concept in the future, in the same spirit as the first HW assignment on probability distributions.
One good resource is Chapter 5.1 of Sivia (5.1.2 specifically).
We'll continue working on Class Activity 6 in class Wednesday. The figures and captions will be due next Monday.
One good resource is Chapter 5.1 of Sivia (5.1.2 specifically).
We'll continue working on Class Activity 6 in class Wednesday. The figures and captions will be due next Monday.
Thursday, October 28, 2010
HW 5 Clarifications: What does the evidence ratio mean? (and revisions to the assignment)
Questions 4 and 5 on the HW are a bit confusing. Here's how to think of it. [Skip to the bottom if you're confused by the following ramblings about the role of Monte Carlo simulations in this kind of analysis.]
An evidence ratio (or "Bayes factor") has a specific meaning: that is, it is the odds ratio in favor of one model over another. If Ev1/Ev2 = 9, then your data give 9:1 odds that Model 1 is right; in other words, you have 90% confidence (9/(9+1)) in Model 1.
Why then, you might (rightly) ask, are we asking you do to Monte Carlo simulations? And what then is the point, you might (astutely) inquire, of Section 3.2 of our paper?
Well, to be honest, the more I think about it the more unnecessary I believe Section 3.2 is. Except for two important purposes: a sanity check and a convincing demonstration. Monte Carlo simulations allow you test the frequentist performance of your Bayesian statistic; this is what we do in Section 3.2, and what we are asking you to do in question 4. In Section 3.2 it was relatively straightforward to do Monte Carlo simulations to do this, because the two models I was comparing have no free parameters. And we find that the simulation-based confidence of 94% is (almost) exactly what you get by just calculating the confidence directly from the R value. HOWEVER, it is no longer clear to me how to do these simulations, or exactly what they mean, in cases where you have free parameters. What I did in this case (Section 4.2/4.3) was to simulate data with various values of the parameters and weight the resulting R distribution by the values of the parameter that the data favored (i.e. marginalized over the parameter). This ends up with a result that does in fact confirm the confidence that the Bayes factor suggests (R = 1.16 gives 94% confidence analytically), but this is a different procedure from what we have asked you to do in Question 4, where we tell you to simulate data just using the best fit parameters.
In addition to this, Question 5 is somewhat unclear.
John's out of town right now, but in my role as temporary deputy of this class, I ask you to combine the tasks that the current Questions 4 and 5 are asking you to do by answering the following two-part question:
How likely are you to be able to confidently (>95%) detect the presence of two lighthouses, as a function of their separation? And how likely are you to confidently select a one-LH model over a two-LH model, if the data really do come from just one LH?
This will still require you to do MC simulations, but this time not to try to confirm the frequentist performance of your statistic, but as a way to help you understand the limits and capabilities of your experiment.
To turn in Monday:
- Example 1-d and 2-d posterior plots for each data set (i.e. the one generated with 1 LH and the one generated w/ 2 LHs), with the true values of alpha marked. I would like the 2-d posteriors to have 1-, 2-, and 3-sigma contours, though if you don't get around to this yet, not a huge deal.
- A plot of "probability of confidently detecting 2 LHs" vs. LH separation (with a caption explaining what the plot means that someone not taking this class could understand)
- A histogram of R values resulting from a large number of data sets simulated using one LH, with a caption giving your interpretation of what this means.
Wednesday, October 20, 2010
HW4: My Solutions
Hey class- Here are the results I get from the two exercises. My prior confidence in myself doing things like this correctly on the first try is generally something around 93%. So if you get something different, check yourself first, but if your result is persistently different from mine, let me know!
Question 1: What kind of coin?
Question 2: Fair flipper or cheater?
Remember to write captions (short but to the point) for each figure: be very clear in your language about what you're calculating! This is particularly important because most people think Bayesian methods are scary and magical; your job is to show that what you do is clear and logical!
HW4, Question 2 clarification
Hey folks- I apologize for the incomplete and inexact wording of question two on the latest worksheet.
Here is the story. A Type I flipper always tells the truth. A Type II flipper, with some unknown probability f (f less than 0.5), will decide before a flip that, no matter what you guess, he will make you wrong. That is, no matter what the coin actually is, if you guess 'heads' he says 'tails,' and vice versa.
Here is the story. A Type I flipper always tells the truth. A Type II flipper, with some unknown probability f (f less than 0.5), will decide before a flip that, no matter what you guess, he will make you wrong. That is, no matter what the coin actually is, if you guess 'heads' he says 'tails,' and vice versa.
Model Selection in the Literature
Monday, October 18, 2010
Homework 3: What is due Wednesday
For Wednesday, you can skip Problem 4. I think it is too time-intensive for the lesson I wanted you to learn. The punchline is shown in the figure below, which was generated by Jon Swift (of Swift & Beaumont 2009).
The point is that the least-squares approach underestimates the power-law index for all values of N. So the next time you see a power-law fit in the literature, be suspicious!
For Wednesday, do Problem 3. In addition to what is written, estimate the power-law index for the exoplanet masses using MLE, and compare your answer to the value you find from a least-squares fit as discussed in Problem 4. For the least-squares fit, you can ignore mass bins containing zero planets.
Turn in a figure showing the pdf for alpha, and a figure illustrating the best-fitting powerlaws from both methods compared to a histogram of the planet masses. For the two figures, write captions describing what is plotted.
ETC: less than 2 hours
Reading assignment due Wed is still Sivia Chapter 4.1
The point is that the least-squares approach underestimates the power-law index for all values of N. So the next time you see a power-law fit in the literature, be suspicious!
For Wednesday, do Problem 3. In addition to what is written, estimate the power-law index for the exoplanet masses using MLE, and compare your answer to the value you find from a least-squares fit as discussed in Problem 4. For the least-squares fit, you can ignore mass bins containing zero planets.
Turn in a figure showing the pdf for alpha, and a figure illustrating the best-fitting powerlaws from both methods compared to a histogram of the planet masses. For the two figures, write captions describing what is plotted.
ETC: less than 2 hours
Reading assignment due Wed is still Sivia Chapter 4.1
Are your likelihoods slowing you down?
Sorry about the formatting here; looks like the Google docs embed thing isn't quite compatible with the blog formatting....Hope you can still read what you need here.
Friday, October 15, 2010
Programming Investment: Making Figures
If you continue on in astronomy, you'll find (if you haven't already) that most of the actual time you spend doing research is, in fact, programming. Not just programming to actually do whatever analysis you're doing, but often a comparable amount of time putting together nifty professional-looking, presentable figures to communicate your results.
With this in mind, you should be thinking about building an accessible toolkit of useful routines that perform tasks that you expect to repeat often. A first example of something like this would be the function we had you write a few weeks ago that calculates a 1-D confidence interval, given vectors of x and p(x). You should consider writing similar routines for common plotting tasks.
For example, I've written a function called "plot_posterior" that takes as input x and p(x) (or L(x)--not necessarily normalized), just like your confidence interval function, and makes a pretty plot, as follows:
>>> x = arange(-2,6,0.001) #make x vector
>>> px = normal(x,2,1) #evaluate N(2,1) on x
>>> plot_posterior(x,px,name='x',labelpos=(0.65,0.8)) #make the plot
(Note the default thicker-than-normal axes and lines, which is the quickest way to make your plots look more presentable and should become a habit.)
The function also has other optional parameters, such as plotting different confidence intervals, marking the median value instead of the mode as the representative value, using a symmetric interval instead of the shortest interval, turn off shading, turn off labeling, change the number of decimal places to which the result and errors are quoted, etc. This way, whenever I want to make a nice plot of a posterior distribution with regions marked and labeled and all, I don't have to start from scratch.
I've also written a similar function ('plot_posterior2d') to summarize the results of a two-dimensional parameter estimation, as you are currently working on with the Fischer & Valenti data. This produces the following, which is a nice concise way to illustrate the results:
>>> plot_posterior2d(a,b,L,confs=[0.68,0.95,0.99],'a','b',decplaces1=3)
As you can see, this function uses the 1D version of 'plot_posterior' to make the marginalized pdfs (once called with the 'horizontal' option). (It also draws only selected confidence contours on the contour plot, which is something we have not yet asked you to do, but you should be thinking about.) And now, with a modest front-end time investment, I can instantly make a snappy publication-quality plot of any two-dimensional likelihood surface I wish, potentially saving lots of time down the road.
NOTE: I'm doing all my work here using python, which I heartily recommend to all of you to experiment with at some point, if solely because of this amazing collection of example plots that I visit anytime I ever want to make any kind of fancy-looking plot. The language is quite similar in many ways to both Matlab and IDL, and has the benefits that it's both much more portable and much more free than either. I've just picked it up in the last year or so, and I'd be happy to help guide anyone who's interested.
Thursday, October 14, 2010
How to present a result?
When using Bayesian inference for parameter estimation, one has several choices as to how to present your results. In question #4 on HW2, you are asked to calculate the posterior pdf for f, the binary fraction of O stars, given 3 companion detections in 21 systems. If you were presenting this result in a paper, how would you characterize it? What is the answer?
Well, in Bayesian parameter estimation, the answer is the entire posterior pdf:
But of course, you need to quote a result somewhere in the text, and perhaps in a table, in order that someone else can read your paper and quickly take your result and use it in their work. So what do you do? Do you quote the median with a symmetric 68% confidence interval?
This is the easiest one to calculate, but with an asymmetric pdf like this it doesn't actually look that great, in my opinion. So then do you quote the most likely value, with a different confidence interval? (I believe this one is the "shortest confidence interval.")
Maybe looks a bit nicer, but really, who's to judge? Anyway, the point is that exactly the same data and analysis gives slightly different "results," depending on what you choose to quote.
This is funny, because by summarizing "results" like this you're actually encouraging others to play the "assume a normal distribution with a particular sigma" game. And I fully admit to having done this myself: taking quoted results that I know don't come from a normal posterior distribution and approximating them as normal in my analysis. What can I say, sometimes we're just lazy...who wants to track down a whole bunch of exact posterior distributions from a dozen different authors? But at least this way I can comfort myself by knowing that while I'm taking shortcuts, I know that I'm taking them....
Wednesday, October 13, 2010
Homework 2: Example TeX file
\begin{document}
\title{Your Title}
\author{Joan T. Studint\altaffilmark{1}}
\email{studint@caltech.edu}
\altaffiltext{1}{Department of Astrophysics, California Institute of Technology, MC 249-17, Pasadena, CA 91125}
\begin{abstract}
Background and motivation. Methodology. Results. Conclusion.
\end{abstract}
\keywords{methods: statistical}
\section{Data}
Where did you get the data? Cite your sources. How were the data measured?
\section{Analysis}
Describe your analysis. Include equations
\section{Results}
What did you find? Show figures. How did it compare to previous studies?
\end{document}
\title{Your Title}
\author{Joan T. Studint\altaffilmark{1}}
\email{studint@caltech.edu}
\altaffiltext{1}{Department of Astrophysics, California Institute of Technology, MC 249-17, Pasadena, CA 91125}
\begin{abstract}
Background and motivation. Methodology. Results. Conclusion.
\end{abstract}
\keywords{methods: statistical}
\section{Data}
Where did you get the data? Cite your sources. How were the data measured?
\section{Analysis}
Describe your analysis. Include equations
\section{Results}
What did you find? Show figures. How did it compare to previous studies?
\end{document}
Correction to HW 2 data set
There was an error in the cross-listed data set. The error has been fixed and checked against exoplanets.org.
The updated data set can be found here:
http://www.astro.caltech.edu/~johnjohn/astrostats/data/fv05_vf05_data.txt
The updated data set can be found here:
http://www.astro.caltech.edu/~johnjohn/astrostats/data/fv05_vf05_data.txt
Correction to HW 2
Problem 9 refers to "HW3." This is a mistake. It is referring to the analysis in problem 8.
Office Hours 8-10pm, Thursdays
We will be holding weekly "homework lab"-type office hours from 8pm to 10pm on Thursdays, in the classroom (Cahill 219). Please come by if you have questions or if you're stuck with anything on the assignments; either Professor Johnson or Tim (or both) will be there to help.
Data for Class Activity 2
The cross-listed data can be found here:
http://www.astro.caltech.edu/~johnjohn/astrostats/data/fv05_vf05_data.txt
http://www.astro.caltech.edu/~johnjohn/astrostats/data/fv05_vf05_data.txt
Tuesday, October 12, 2010
Class Activity 2.1: Time per problem
General comment on CA2:
1) Problems 9 and 10 (the writeup) will be due Monday. For problem 9, compare the marginalized pdfs of alpha and beta for the case with and without consideration of sigma_[Fe/H]. When you get here Wednesday or thereafter, do not do the integrals analytically! Do your integration numerically.
2) The primary goals of this assignment are
3) Time spent debugging should decline monotonically throughout the semester. If you get discouraged after spending an hour chasing down a MATLAB or LaTeX error, take heart in that this is one fewer bug to chase down later as you pursue your research.
4) Ask for help, especially if you exceed any specific ETC below. Talk in person and via email to your classmates, Tim, or myself.
Comments and ETCs for homework problems:
Several students have requested an estimate for the amount of time one should spend on each problem. For Wednesday, problems 1-7 are "due," which means that you should be prepared to work on problems 8-10 in class with your partner. It looked like most teams finished through 4 in class, so here are my estimated times to completion (ETC) and notes for problems 5-7:
CA5) You may ignore my specific suggestion of assigning 60% completeness to half of the stars. The key to this problem is to investigate how detection sensitivity (completeness) factors into your estimate of f. You may assign Si to the stars in your sample as you see fit. Query your intuition beforehand; how do you expect S to affect your estimate of f? Then do the test.
This should take no longer than it took you to complete problem 4: ETC 20 minutes
CA6) I will send out the cross-listed data set thanks to Mike B. and Matt.
You should still look up the FV05 paper on NASA ADS, and read the sections that describe their data and analysis so you have the necessary background for this problem (the intro would also be useful for context). ETC 30-40 minutes for reading the relevant sections
CA7) ETC 10 minutes of thought after examining Fig 5 in FV05.
Total ETC <= 1.2 hours
1) Problems 9 and 10 (the writeup) will be due Monday. For problem 9, compare the marginalized pdfs of alpha and beta for the case with and without consideration of sigma_[Fe/H]. When you get here Wednesday or thereafter, do not do the integrals analytically! Do your integration numerically.
2) The primary goals of this assignment are
- to understand how to calculate the occurrence rate as a function of object properties (planet occurrence as a function of stellar metallicity).
- present your results clearly and professionally
- gain science writing experience
3) Time spent debugging should decline monotonically throughout the semester. If you get discouraged after spending an hour chasing down a MATLAB or LaTeX error, take heart in that this is one fewer bug to chase down later as you pursue your research.
4) Ask for help, especially if you exceed any specific ETC below. Talk in person and via email to your classmates, Tim, or myself.
Comments and ETCs for homework problems:
Several students have requested an estimate for the amount of time one should spend on each problem. For Wednesday, problems 1-7 are "due," which means that you should be prepared to work on problems 8-10 in class with your partner. It looked like most teams finished through 4 in class, so here are my estimated times to completion (ETC) and notes for problems 5-7:
CA5) You may ignore my specific suggestion of assigning 60% completeness to half of the stars. The key to this problem is to investigate how detection sensitivity (completeness) factors into your estimate of f. You may assign Si to the stars in your sample as you see fit. Query your intuition beforehand; how do you expect S to affect your estimate of f? Then do the test.
This should take no longer than it took you to complete problem 4: ETC 20 minutes
CA6) I will send out the cross-listed data set thanks to Mike B. and Matt.
You should still look up the FV05 paper on NASA ADS, and read the sections that describe their data and analysis so you have the necessary background for this problem (the intro would also be useful for context). ETC 30-40 minutes for reading the relevant sections
CA7) ETC 10 minutes of thought after examining Fig 5 in FV05.
Total ETC <= 1.2 hours
Subscribe to:
Posts (Atom)