I am an Assistant Professor in Statistics at Department of Mathematics, Imperial College London.

My research is revolving around developing and exploring stochastic and probabilistic mathematical machinery for statistical inference and machine learning. My current interests are in building theory and methods for computational statistics, generative models, and signal processing. A few highlights are:

See works page for preprints, papers, slides, posters, and other things related to my work. I maintain a research blog called almost stochastic for short notes which might be of interest to other people.

If you are interested in joining, please see this page.

Here are some more links: My Google Scholar, github, LinkedIn.


01/02/2024: Videos of our RSS Workshop on Gradient Flows For Sampling, Inference, and Learning are available on Youtube.

18/10/2023: I am organising a Royal Statistical Society workshop on Gradient Flows For Sampling, Inference, and Learning. See details here.

7/4/2023: After two great challenge weeks and a final workshop, you can see the outcomes of the Turing events on this page.

06/06/2022: I am organizing a workshop at The Alan Turing Institute titled Accelerating generative models and nonconvex optimisation.

30/03/2022: I gave two introductory lectures in Sequential Monte Carlo (SMC) Masterclass event, held in Bristol. Recordings can be seen from here: [lecture 1], [lecture 2] and lecture notes are also available from this link.

Old (some very old) news.

Recent Highlights

Interacting Particle Langevin Algorithm for Maximum Marginal Likelihood Estimation

arXiv preprint, 2023.

ÖDA, F. R. Crucinio, M. Girolami, T. Johnston, S. Sabanis

We study a class of interacting particle systems for implementing a marginal maximum likelihood estimation (MLE) procedure to optimize over the parameters of a latent variable model. We prove nonasymptotic concentration bounds for the optimisation error of the maximum marginal likelihood estimator in terms of the number of particles in the particle system, the number of iterations of the algorithm, and the step-size parameter for the time discretisation analysis.

[paper], [talk]

Statistical Finite Elements via Langevin Dynamics

SIAM/ASA Journal on Uncertainty Quantification, (2022).

ÖDA, C. Duffin, S. Sabanis, M. Girolami

We use Langevin dynamics to solve the statFEM forward problem, studying the utility of the unadjusted Langevin algorithm (ULA), a Metropolis-free Markov chain Monte Carlo sampler, to build a sample-based characterisation of this otherwise intractable measure. Leveraging the theory behind Langevin-based samplers, we provide theoretical guarantees on sampler performance, demonstrating convergence, for both the prior and posterior, in the Kullback-Leibler divergence, and, in Wasserstein-2, with further results on the effect of preconditioning. Numerical experiments are also provided, for both the prior and posterior, to demonstrate the efficacy of the sampler, with a Python package also included.

[paper], [code]

Probabilistic sequential matrix factorization

The 24th International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.

ÖDA, G. G. J. van den Burg, T. Damoulas, M. Steel

We introduce the probabilistic sequential matrix factorization (PSMF) method for factorizing time-varying and non-stationary datasets consisting of high-dimensional time-series. In particular, we consider nonlinear Gaussian state-space models where sequential approximate inference results in the factorization of a data matrix into a dictionary and time-varying coefficients with (possibly nonlinear) Markovian dependencies. The assumed Markovian structure on the coefficients enables us to encode temporal dependencies into a low-dimensional feature space.

[paper] [code] [slides]

Convergence rates for optimised adaptive importance samplers

Statistics and Computing volume 31, 12 (2021).

ÖDA, J. Miguez

We investigate an adaptation strategy based on convex optimisation which leads to a class of adaptive samplers. These samplers rely on the iterative minimisation of the \(\chi^2\)-divergence between an exponential family proposal and the target. We prove non-asymptotic error bounds for the mean squared errors (MSEs) of these algorithms, which explicitly depend on the number of iterations and the number of samples together. We also demonstrate explicit links between hyperparameters of these samplers, the number of samples, and the number of iterations.


VarGrad: A Low-Variance Gradient Estimator for Variational Inference

NeurIPS 2020

L. Richter, A. Boustati, N. Nüsken, F. J. R. Ruiz, ÖDA

We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the log-variance loss. Under certain conditions, the gradient of the log-variance loss equals the gradient of the (negative) ELBO. We show theoretically that this gradient estimator, which we call VarGrad due to its connection to the log-variance loss, exhibits lower variance than the score function method in certain settings, and that the leave-one-out control variate coefficients are close to the optimal ones.

[paper], [code]

Generalised Bayesian Filtering via Sequential Monte Carlo

NeurIPS 2020

A. Boustati, ÖDA, T. Damoulas, A. M. Johansen

We introduce a framework for inference in general state-space hidden Markov models (HMMs) under likelihood misspecification. In particular, we leverage the loss-theoretic perspective of Generalized Bayesian Inference (GBI) to define generalised filtering recursions in HMMs, that can tackle the problem of inference under model misspecification. In doing so, we arrive at principled procedures for robust inference against observation contamination by utilising the β-divergence. Operationalising the proposed framework is made possible via sequential Monte Carlo methods (SMC), where most standard particle methods, and their associated convergence results, are readily adapted to the new setting.

[paper], [code]

Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization

Statistics and Computing volume 30, pages 1645–1663 (2020)

ÖDA, D. Crisan, J. Miguez

We introduce and analyze a parallel sequential Monte Carlo methodology for the numerical solution of optimization problems that involve the minimization of a cost function that consists of the sum of many individual components. The proposed scheme is a stochastic zeroth-order optimization algorithm which demands only the capability to evaluate small subsets of components of the cost function. It can be depicted as a bank of samplers that generate particle approximations of several sequences of probability measures. These measures are constructed in such a way that they have associated probability density functions whose global maxima coincide with the global minima of the original cost function. We provide explicit convergence rates in terms of the number of generated Monte Carlo samples and the dimension of the search space.


Nudging the particle filter

Statistics and Computing, volume 30, pages 305–330(2020)

ÖDA, J. Miguez

We investigate a new sampling scheme aimed at improving the performance of particle filters whenever (a) there is a significant mismatch between the assumed model dynamics and the actual system, or (b) the posterior probability tends to concentrate in relatively small regions of the state space. The proposed scheme pushes some particles toward specific regions where the likelihood is expected to be high, an operation known as nudging in the geophysics literature. We reinterpret nudging in a form applicable to any particle filtering scheme, as it does not involve any changes in the rest of the algorithm. We prove analytically that nudged particle filters can still attain asymptotic convergence with the same error rates as conventional particle methods. Simple analysis also yields an alternative interpretation of the nudging operation that explains its robustness to model errors.