Gaussian Mixture Models

Project 9:

A probabilistic approach to emission-line galaxy classification

Classification of objects has long been recognized as a major driver in natural sciences, from taxonomical classification of species, anthropological variation of cultures (e.g. Stocking 1968), to the vastness of galaxy shapes (De Vaucouleurs 1959). Empirical classifications are powerful triggers for novel theories, an archetypal example being the Linnaean classification of organisms (Linnaeus 1758) that subsequently inspired the birth of Darwin’s renowned theory of common descent (Darwin 1859).

In the context of extragalactic astrophysics, various classification schemes have been proposed to help ascertain the main drivers regulating galaxy evolution; this task becomes imperative in the face of the deluge of information gathered by current (e.g. Sloan Digital Sky Survey, York et al. 2000; Zhang & Zhao 2015) and upcoming (e.g. Large Synoptic Survey Telescope, Ivezic et al. 2009) large-scale sky surveys.

Notably, emission lines are powerful diagnostics to differentiate galaxies according to their ionization power source, among which the most widely used emission-line diagnostics is the Baldwin-Phillips-Terlevich (BPT). A common characteristic of most of these diagrams and the majority of standard classification systems in astronomy is the sharp division between classes, in which boundaries are more often than not defined by eye or fitted without accounting for a smooth transition between objects.

We invoke a Gaussian mixture model (GMM) to provide a probabilistic description of galaxy classification based on their emission-lines. The best-fit GMM based on several statistical criteria suggests a solution around four Gaussian components (GCs), which are capable to explain up to 97 per cent of the data variance. Using elements of information theory, we compare each GC to their respective astronomical counterpart, and we demonstrate the potential of our methodology to recover/unravel different objects inside the wilderness of astronomical datasets, without lacking the ability to convey physically interpretable results. The probabilistic classifications from the GMM analysis are publicly available.

Full citation: de Souza, et al, 2017, MNRAS, Volume 472, Issue 3, p.2808-2822

This project is a result from COIN Residence Program #3 – Budapest, Hungary/2016.

Pre-print – Bibtex – Catalogue – Tutorial

BPT diagram, with galaxy points from the SDSS and SEAGal datasets.

Team:
Rafael S. de Souza, U. North Carolina (USA)
Maria Luiza L. Dantas, U. Sao Paulo (Brazil)
Marcos V. C. Duarte, U. Sao Paulo (Brazil)
Eric Feigelson, Pennsylvania State U. (USA)
Madhura Killedar, Burnet Institute (Australia)
Pierre-Yves Lablanche, African Institute for Mathematical Sciences (South Africa)
Ricardo Vilalta, U. Houston (USA)
Alberto Krone-Martins, U. Lisbon (Portugal)
Robert Beck, ELTE (Hungary)
Fabian Gieseke, U. Copenhagen (Denmark)

A probabilistic approach to emission-line galaxy classification

Related posts

1 – Binomial regression

2 – Gamma Regression

3 – Negative Binomial Regression