Gaussian Mixture Models

A probabilistic approach to emission-line galaxy classification

Classification of objects has long been recognized as a major driver in natural sciences, from taxonomical classification of species, anthropological variation of cultures (e.g. Stocking 1968), to the vastness of galaxy shapes (De Vaucouleurs 1959). Empirical classifications are powerful triggers for novel theories, an archetypal example being the Linnaean classification of organisms (Linnaeus 1758) that subsequently inspired the birth of Darwin’s renowned theory of common descent (Darwin 1859).

In the context of extragalactic astrophysics, various classification schemes have been proposed to help ascertain the main drivers regulating galaxy evolution; this task becomes imperative in the face of the deluge of information gathered by current (e.g. Sloan Digital Sky Survey, York et al. 2000; Zhang & Zhao 2015) and upcoming (e.g. Large Synoptic Survey Telescope, Ivezic et al. 2009) large-scale sky surveys.

Notably, emission lines are powerful diagnostics to differentiate galaxies according to their ionization power source, among which the most widely used emission-line diagnostics is the Baldwin-Phillips-Terlevich (BPT). A common characteristic of most of these diagrams and the majority of standard classification systems in astronomy is the sharp division between classes, in which boundaries are more often than not defined by eye or fitted without accounting for a smooth transition between objects.

We invoke a Gaussian mixture model (GMM) to provide a probabilistic description of galaxy classification based on their emission-lines. The best-fit GMM based on several statistical criteria suggests a solution around four Gaussian components (GCs), which are capable to explain up to 97 per cent of the data variance. Using elements of information theory, we compare each GC to their respective astronomical counterpart, and we demonstrate the potential of our methodology to recover/unravel different objects inside the wilderness of astronomical datasets, without lacking the ability to convey physically interpretable results. The probabilistic classifications from the GMM analysis are publicly available.

Full citation: de Souza, et al, 2017, MNRAS, Volume 472, Issue 3, p.2808-2822

This project is a result from COIN Residence Program #3 – Budapest, Hungary/2016.

Projects

SAGUI

A framework designed to segment multi-band galaxy images by combining morphology and spectral information. The method first uses a starlet-based

AT2022zod

Here we present AT2022zod, an extreme, short-lived optical flare in an elliptical galaxy at z = 0.11, residing within 3

ELEPHANT

ELEPHANT represents an effective strategy to filter extragalactic events within large and complex astronomical alert streams. There are many applications

Are classification metrics good proxies for SN Ia cosmological constraining power?

We emulate photometric SN Ia cosmology samples with controlled contamination rates of individual contaminant classes and evaluate each of them

A graph-based spectral classification of SN-II

This work presents new data-driven classification heuristics for spectral data based on graph theory. As a case in point, we

Spectroscopic Confirmation of a population of Isolated, Intermediate-Mass YSOs

The Spitzer/IRAC Candidate YSO (SPICY) catalog is one of the largest compilations of such objects (~120,000 candidates in the Galactic