Gaussian Mixture Models

A probabilistic approach to emission-line galaxy classification

Gaussian Mixture Models
The Gaussian components projected onto the BPT diagram. For each component the thick lines represent 68% and 95% confidence levels, respectively.
A probabilistic approach to emission-line galaxy classification
BPT diagram with galaxy points from the SDSS and SEAGal datasets. On the BPT diagram, the curves define the division between SF and AGN classes (dotted: Kewley et al. 2001; solid: Stasinska et al. 2006; dashed: Kauffmann et al. 2003), and the dot-dashed line shows the division between AGN and LINERs as suggested by Schawinski et al. (2007).

Classification of objects has long been recognized as a major driver in natural sciences, from taxonomical classification of species, anthropological variation of cultures (e.g. Stocking 1968), to the vastness of galaxy shapes (De Vaucouleurs 1959). Empirical classifications are powerful triggers for novel theories, an archetypal example being the Linnaean classification of organisms (Linnaeus 1758) that subsequently inspired the birth of Darwin’s renowned theory of common descent (Darwin 1859).

In the context of extragalactic astrophysics, various classification schemes have been proposed to help ascertain the main drivers regulating galaxy evolution; this task becomes imperative in the face of the deluge of information gathered by current (e.g. Sloan Digital Sky Survey, York et al. 2000; Zhang & Zhao 2015) and upcoming (e.g. Large Synoptic Survey Telescope, Ivezic et al. 2009) large-scale sky surveys.

Notably, emission lines are powerful diagnostics to differentiate galaxies according to their ionization power source, among which the most widely used emission-line diagnostics is the Baldwin-Phillips-Terlevich (BPT). A common characteristic of most of these diagrams and the majority of standard classification systems in astronomy is the sharp division between classes, in which boundaries are more often than not defined by eye or fitted without accounting for a smooth transition between objects.

We invoke a Gaussian mixture model (GMM) to provide a probabilistic description of galaxy classification based on their emission-lines. The best-fit GMM based on several statistical criteria suggests a solution around four Gaussian components (GCs), which are capable to explain up to 97 per cent of the data variance. Using elements of information theory, we compare each GC to their respective astronomical counterpart, and we demonstrate the potential of our methodology to recover/unravel different objects inside the wilderness of astronomical datasets, without lacking the ability to convey physically interpretable results. The probabilistic classifications from the GMM analysis are publicly available.

Full citation: de Souza, et al, 2017, MNRAS, Volume 472, Issue 3, p.2808-2822

This project is a result from COIN Residence Program #3 – Budapest, Hungary/2016.

Dawn of Stars

Dawn of Stars tells the story of how stars are formed. Most stars are born in groups which are truly