Caltech Astrostats: April 2012

Monday, April 30, 2012

Class Activity 3

http://www.astro.caltech.edu/~johnjohn//astrostats/CA3.pdf

Friday, April 27, 2012

Dispositive Nulls and Detection Limits

My good friend Prof. Jason Wright has been tackling the notion of a "dispositive null" over on his professional blog. Here's his first entry, and his followup. This is good stuff and a natural extension of the notion of assigning confidence to our belief in a hypothesis.

avoid using loops!

So one of the important things to know about python is that it is an "interpreted" language, not a compiled language. Interpreted languages (like python, IDL, and Matlab) are designed to be "higher-level" languages, and thus do not have to be compiled in the same way as C or C++, for example. This makes some things simpler, but there are some costs.

One of the costs is that loops take a long time to execute. See the following example:

As you see, doing operations element-by-element on an array is MUCH slower than working with whole arrays at a time. In fact, behind the scenes, numpy is using all the power of compiled code to make these array operations lightening-fast. But you don't need to know how this works; all you need to do is get comfortable working with whole arrays, whenever possible. A good example of this in action is the difference between the following two ways to create arrays of random numbers:

Monday, April 23, 2012

CA1 Solution Sets

Everyone did a nice job on the Class Activities this week. Thank you for all your hard effort and careful work. Coco has placed marked-up PDF files in all of your Dropbox folders. Here are three writeups that I feel constitute the "solution set."

Solution 1 (Peter)
Solution 2 (Melodie)
Solution 3 (Adam)

In the future, please strive to have your write-up be clear enough to serve as the solution set. Science is all about communication, so the clarity of your presentation is a big part of your assessment in this class.

Friday, April 20, 2012

arbitrary precision in Python

Some of you mentioned about the rounding error in python. There is a Python package which can give you arbitrary precision. Take a look if you are interested.

http://code.google.com/p/mpmath/

On Python Functionality

"Any self-respecting programming language has a histogram function!" -Jennifer

"Oh! Here it is!" -Jessica

Thursday, April 19, 2012

Yikes!

IDL users: Be careful of the indexing of rows and columns in Python. From Jon Swift:

i.e. why has no one ever told me?
IDL:------IDL> a = [[1,1,1],[2,2,2],[3,3,3],[4,4,4]]IDL> help,aA INT = Array[3, 4]

python:----------In [2]: a = [[1,1,1],[2,2,2],[3,3,3],[4,4,4]]
In [3]: print np.shape(a)(4, 3)

Monday, April 16, 2012

Take the survey

Up there, at the top of the page. Click an answer. Feel good that your response will matter.

IDL to Python translation guide

https://www.cfa.harvard.edu/~jbattat/computer/python/science/idl-numpy.html

Office Hours w/ the Prof

In addition to the Wed night help sessions with Coco, I will also hold office hours from 10am-11:30am Wednesday mornings.

Friday, April 13, 2012

This!

Can any of you do this for Bayesian statistics, for e.g. line fitting? If you can and do, you get an A+, automatically.

DeTeXify

http://detexify.kirelabs.org/classify.html

hat tip: Iryna

Thursday, April 12, 2012

CA1 modification

Based on feedback from Wednesday night's help session, it looks like I underestimated the time necessary to complete the class activity (in true professorial fashion). I forgot that problems 6 and 7 required derivations and that many of you have yet to learn LaTeX, and that problem 8 is nontrivial.

Here's the new plan:

Turn in problems 1-5 at the beginning of class Friday. We'll then spend Friday's class talking about LaTeX and working on fitting lines.

Wednesday, April 11, 2012

Handy LaTeX References

Learning how to write in LaTeX is a valuable skill to learn early in your scientific career. Don't be that guy/gal who writes their papers in Word. Ugh!

LaTeX Math Symbols:

http://web.ift.uib.no/Teori/KURS/WRK/TeX/symALL.html

Handy online LaTeX editor (click buttons for examples of code):

http://www.codecogs.com/latex/eqneditor.php

(also, check the URLs for all of the equations in the previous post for the LaTeX used to create them).

How to pronounce and write "LaTeX":

http://en.wikipedia.org/wiki/LaTeX#Pronouncing_and_writing_.22LaTeX.22

I prefer "lay-tech" but I won't make fun of you if you say "lah-tech." I will make fun of you if you say "lay-teks" :)

Tuesday, April 10, 2012

Integrating exponentials

Often in physics, and sometimes in life, you come across the need to integrate an exponential of the form

$\large A = \int_{-\infty}^{\infty}e^{-a x^2} dx$

Allow me to show you how to handle this using simple dimensional analysis rather than calculus and memorization. Dimensional analysis can get you out of a bind when working on a plane (sans wireless), in an oral exam or even during Q&A after your colloquium!

First, note that the units of A must be the same as the units of x since exponentials are dimensionless and dx has units of x. Further, examination of the quantity in the exponent reveals that a must have units of 1/x^2, since the argument of an exponential must be dimensionless, too. Thus, the integral must have units of x and involve a, like so:

$\large A = \int_{-\infty}^{\infty}e^{-a x^2} dx \propto \frac{1}{\sqrt{a}}$

This is most of the way there. It turns out that there's a missing factor of the square-root of pi:

$\large A = \int_{-\infty}^{\infty}e^{-a x^2} dx = \sqrt{\frac{\pi}{a}}$

But I think it's pretty cool that you can get to within a factor of root-pi (1.77) without any calculus! I can pretty easily remember the pi part after I get the dimensions correct. Even if I forget, being within a factor of two is good enough for astronomy in most applications.

You might notice that this is the form of the Gaussian function, centered on x=0 with

$\large a = \frac{1}{2\sigma^2}$

Once normalized, the Gaussian function becomes the normal distribution so frequently used in data analysis (and CA1). Note the distinction between a Gaussian function and a normal distribution. The difference is important, but frequently ignored in the scientific literature. For example, a Gaussian has three free parameters. A normal distribution has only two. And only one of these is a proper probability distribution function (pdf).

For more "Street Fighting Mathematics" like this, check out this book.

Monday, April 9, 2012

Normalization

I'm sorry I wasn't able to be there in class today. However, Coco reports that you all made good progress working on Class Activity 1. She also reports that many of you struggled with the normalization part of Problem (4). Assuming this is where the problem was, allow me to help everyone along.

Problem (4) states:

Without properly normalizing things, you will end up with a proportionality of the form:

$\large p(\mu\,|\,\{A\}) \propto p(\{A\}\,|\,\mu)\ p(\mu)$

In order for this to be an equation, you have to normalize it. You might want to convince yourself that the quantity on the right hand side is not normalized by integrating over all values of \mu. In fact, dividing by this integrated quantity provides the normalization constant:

$\large p(\mu\,|\,\{A\}) =\frac{p(\{A\}\,|\,\mu)\ p(\mu)}{\int_{-\infty}^{\infty} p(\{A\}\,|\,\mu)\ p(\mu) \rm{d}\mu}$

The denominator is also known as the "evidence":

$\large p(\{A\}) =\int_{-\infty}^{\infty} p(\{A\}\,|\,\mu)\ p(\mu) \rm{d}\mu$

or the probability of the data given the model, with \mu marginalized out.

The actual value of the integral can be expressed analytically, or you could just do it numerically. Or you can use WolframAlpha. But whatever you do, don't get too hung up on this! :)

Wednesday, April 4, 2012

Ay117: Starting anew in 2012

Welcome to the new class of AstroStats students!

The (rough) course syllabus can be found here:

http://www.astro.caltech.edu/~johnjohn/astrostats/

I'll update this syllabus soon, but all of the key points that I covered in the first class today are there.

The first Class Activity is also available. I will post the activities on the right hand side of this blog throughout the term. We'll work on this activity starting Friday morning, and we'll continue through next week. Whatever we don't finish by the end of class Monday will be homework due Friday, unless otherwise specified.

The reading assignment is Chapters 1 and 2 of Sivia (link to Google Books). Once you finish Chapter 2, keep going. The standing reading assignment is all of Part I, to be completed before week 3 of class. Once you read Part I, read it again.