# PROJECTS

### 12 - Incompleteness of nearby cluster population

We report the discovery of 41 new stellar clusters. This represents an increment of at least 20% of the previously known OC population in this volume of the Milky Way. We also report on the clear identification of NGC 886, an object previously considered an asterism. This letter challenges the previous claim of a near-complete sample of open clusters up to 1.8 kpc. Our results reveal that this claim requires revision, and a complete census of nearby open clusters is yet to be found.

### 11 - Active Learning for Supernova Photometric Classification

Active Learning is a class of algorithms that aims to minimize labeling costs by identifying a few, carefully chosen, objects which have high potential in improving a given machine learning classifier. In this project, we show how Active Learning can be used as a tool for optimizing the construction of spectroscopic samples for supernova photometric classification.

### 10 - Integrated Nested Laplace Approximation (INLA)

We introduce a novel technique to model IFS datasets, which treats the observed galaxy properties as manifestations of an unobserved Gaussian Markov random field. The method is computationally efficient, resilient to the presence of low-signal-to-noise regions, and uses an alternative to Markov Chain Monte Carlo for fast Bayesian inference - the Integrated Nested Laplace Approximation. The proposed Bayesian approach enables the creation of synthetic images, recovery of areas with bad pixels, and an increased power to detect structures in datasets subject to substantial noise and/or sparsity of sampling.

### 9 - Gaussian Mixture Models

Here, we show how to use a Gaussian mixture model (GMM) to jointly analyse two traditional emission-line classification schemes of galaxy ionization sources: the Baldwin-Phillips-Terlevich (BPT) and the WHAN diagrams, using spectroscopic data from the Sloan Digital Sky Survey Data Release 7 and SEAGal/STARLIGHT datasets. We apply a GMM to empirically define classes of galaxies in a three-dimensional space spanned by the log [OIII]/H-beta, log [NII]/H-alpha, and log EW(H-alpha) optical parameters. We demonstrate the potential of our methodology to recover/unravel different objects inside the wilderness of astronomical datasets, without lacking the ability to convey physically interpretable results; hence being a precious tool for maximum exploitation of the ever-increasing astronomical surveys.

### 8 - Representativeness in Machine Learning applications for photometric redshifts

We present two galaxy catalogues built to enable a more demanding and realistic test of photo-z methods. We demonstrate the potential of these catalogues by submitting them to the scrutiny of different photo-z methods, including machine learning (ML) and template fitting approaches. Our catalogues represent the first controlled environment allowing a straightforward implementation of such tests.

### 7 - Hierarchical Bayesian Models

We developed a hierarchical Bayesian model to investigate how the presence of Seyfert activity relates to their environment. In elliptical galaxies, our analysis indicates a strong correlation of Seyfert-AGN activity with the cluster centric distance, and a weaker correlation with the mass of the host. In spiral galaxies these trends do not appear, suggesting that the link between Seyfert activity and the properties of spiral galaxies are independent of the environment.

### 6 - Dimensionality Reduction And Clustering for Unsupervised Learning in Astronomy (DRACULA)

DRACULA classifies objects using dimensionality reduction and clustering. The code has an easy interface and can be applied to separate several types of objects. It is based on tools developed in scikit-learn, with Deep Learning usage requiring also the H2O package. We show how it can be used to identify sub-classes of type Ia supernovae.

### 5 - Approximate Bayesian Computation (CosmoABC)

Approximate Bayesian Computation (ABC) enables parameter inference for complex physical systems in cases where the true likelihood function is unknown, unavailable, or computationally too expensive. Here we present COSMOABC, a Python ABC sampler featuring a Population Monte Carlo variation of the original ABC algorithm, which uses an adaptive importance sampling scheme.

### 4 - Analysis of Multi-dimensional Astronomical DAta sets (AMADA)

AMADA allows an iterative exploration and information retrieval of high-dimensional data sets. This is done by performing a hierarchical clustering analysis for different choices of correlation matrices and by doing a principal components analysis in the original data. Additionally, AMADA provides a set of modern visualisation data-mining diagnostics. The user can switch between them using the different tabs.

### 3 - GLM III: Negative Binomial Regression

We elucidate the potential of the class of GLMs which handles count data. The size of a galaxy's globular cluster (GC) population (NGC) is a prolonged puzzle in the astronomical literature. It falls in the category of count data analysis, yet it is usually modelled as if it were a continuous response variable. We have developed a Bayesian negative binomial regression model to study the connection between NGC and the host galaxy properties.

### 2 - GLM II: Gamma Regression

We present a gamma regression model as a fast alternative method for estimating the photometric redshifts of galaxies from their multi-wavelength photometry. Using the gamma family with a log link function we predict redshifts from the PHoto-z Accuracy Testing simulated catalogue and a subset of the Sloan Digital Sky Survey from Data Release 10.

### 1 - GLM I: Binomial Regression

We elucidate the potential of the so-called logit and probit regression techniques, from both a maximum likelihood and a Bayesian perspective. As a case in point, we explore the conditions of star formation activity and metal enrichment in primordial minihaloes from cosmological hydro-simulations including detailed chemistry, gas physics, and stellar feedback.