GLM I: Binomial Regression

The overlooked potential of Generalized Linear Models in astronomy I: Binomial regression

Distribution of molecular fraction and metallicities for haloes colour-coded by whether they host star-formation activity or not. Red colour represents haloes with SF and blue colour haloes with no SF. The bottom and top of the box show the first and third data quartiles, while the band inside the box their median.

Revealing hidden patterns in astronomical data is often the path to fundamental scientific breakthroughs; meanwhile the complexity of scientific inquiry increases as more subtle relationships are sought. Contemporary data analysis problems often elude the capabilities of classical statistical techniques, suggesting the use of cutting edge statistical methods.

In this light, astronomers have overlooked a whole family of statistical techniques for exploratory data analysis and robust regression, the so-called Generalized Linear Models (GLMs). In this paper-the first in a series aimed at illustrating the power of these methods in astronomical applications-we elucidate the potential of a particular class of GLMs for handling binary/binomial data, the so-called logit and probit regression techniques, from both a maximum likelihood and a Bayesian perspective. As a case in point, we present the use of these GLMs to explore the conditions of star formation activity and metal enrichment in primordial minihaloes from cosmological hydro-simulations including detailed chemistry, gas physics, and stellar feedback.

Finally, we highlight the use of receiver operating characteristic curves as a diagnostic for binary classifiers, and ultimately we use these to demonstrate the competitive predictive performance of GLMs against the popular technique of artificial neural networks.

Dawn of Stars

Dawn of Stars tells the story of how stars are formed. Most stars are born in groups which are truly