An evidence ratio (or "Bayes factor") has a specific meaning: that is, it is the odds ratio in favor of one model over another. If Ev1/Ev2 = 9, then your data give 9:1 odds that Model 1 is right; in other words, you have 90% confidence (9/(9+1)) in Model 1.
Why then, you might (rightly) ask, are we asking you do to Monte Carlo simulations? And what then is the point, you might (astutely) inquire, of Section 3.2 of our paper?
Well, to be honest, the more I think about it the more unnecessary I believe Section 3.2 is. Except for two important purposes: a sanity check and a convincing demonstration. Monte Carlo simulations allow you test the frequentist performance of your Bayesian statistic; this is what we do in Section 3.2, and what we are asking you to do in question 4. In Section 3.2 it was relatively straightforward to do Monte Carlo simulations to do this, because the two models I was comparing have no free parameters. And we find that the simulation-based confidence of 94% is (almost) exactly what you get by just calculating the confidence directly from the R value. HOWEVER, it is no longer clear to me how to do these simulations, or exactly what they mean, in cases where you have free parameters. What I did in this case (Section 4.2/4.3) was to simulate data with various values of the parameters and weight the resulting R distribution by the values of the parameter that the data favored (i.e. marginalized over the parameter). This ends up with a result that does in fact confirm the confidence that the Bayes factor suggests (R = 1.16 gives 94% confidence analytically), but this is a different procedure from what we have asked you to do in Question 4, where we tell you to simulate data just using the best fit parameters.
In addition to this, Question 5 is somewhat unclear.
John's out of town right now, but in my role as temporary deputy of this class, I ask you to combine the tasks that the current Questions 4 and 5 are asking you to do by answering the following two-part question:
How likely are you to be able to confidently (>95%) detect the presence of two lighthouses, as a function of their separation? And how likely are you to confidently select a one-LH model over a two-LH model, if the data really do come from just one LH?
This will still require you to do MC simulations, but this time not to try to confirm the frequentist performance of your statistic, but as a way to help you understand the limits and capabilities of your experiment.
To turn in Monday:
- Example 1-d and 2-d posterior plots for each data set (i.e. the one generated with 1 LH and the one generated w/ 2 LHs), with the true values of alpha marked. I would like the 2-d posteriors to have 1-, 2-, and 3-sigma contours, though if you don't get around to this yet, not a huge deal.
- A plot of "probability of confidently detecting 2 LHs" vs. LH separation (with a caption explaining what the plot means that someone not taking this class could understand)
- A histogram of R values resulting from a large number of data sets simulated using one LH, with a caption giving your interpretation of what this means.