GLM III: Negative Binomial Regression

The overlooked potential of Generalized Linear Models in astronomy III: Bayesian negative binomial regression and globular cluster populations

GLM III: Negative Binomial Regression
Globular cluster population, NGC plotted against visual absolute magnitude MV . The dashed line represents the ex- pected value of NGC for each value of MV , while the shaded areas depicts 50%, 95%, and 99% prediction intervals. Galaxy types are coded by shape and colour as follows: Ellipticals (E; blue solid circles), spirals (S; red open triangles), lenticulars (S0; orange asterisks), and irregulars (Irr; green open circles). An ArcSinh transformation is applied in the y-axis for better visualization of the whole range of NGC values, including the null ones.


In this project, the third in a series illustrating the power of generalized linear models (GLMs) for the astronomical community, we elucidate the potential of the class of GLMs which handles count data.

The size of a galaxy’s globular cluster (GC) population (NGC) is a prolonged puzzle in the astronomical literature. It falls in the category of count data analysis, yet it is usually modelled as if it were a continuous response variable. We have developed a Bayesian negative binomial regression model to study the connection between NGC and the following galaxy properties: central black hole mass, dynamical bulge mass, bulge velocity dispersion and absolute visual magnitude.

The methodology introduced herein naturally accounts for heteroscedasticity, intrinsic scatter, errors in measurements in both axes (either discrete or continuous) and allows modelling the population of GCs on their natural scale as a non-negative integer variable. Prediction intervals of 99 per cent around the trend for expected NGC comfortably envelope the data, notably including the Milky Way, which has hitherto been considered a problematic outlier. Finally, we demonstrate how random intercept models can incorporate information of each particular galaxy morphological type.

Bayesian variable selection methodology allows for automatically identifying galaxy types with different productions of GCs, suggesting that on average S0 galaxies have a GC population 35 per cent smaller than other types with similar brightness.

Dawn of Stars

Dawn of Stars tells the story of how stars are formed. Most stars are born in groups which are truly