Dr. Samuel T. Horsfield PhD SNSF Postdoctoral Fellow

Projects

My current research is focused on identifying the evolutionary mechanisms that underpin fungal species adaptation. This includes modelling nucleotide-level evolution of fungal species, and developing novel deep-learning methods to identify diversifying regions in fungal pangenomes. I also have some ongoing collaborations on bacterial genomics projects.

Simulation of fungal pangenomes
Simulation of fungal pangenomes

I have developed a simulation framework for eukaryotic pangenomes, PansimNuc, which models mutation, selection, gene and Transposable Element (TE) gain/loss and mobility, recombination and demography. PansimNuc generates simulated genomes that can be used to benchmark pangenome analysis tools and methods, and will be used to study the interaction of a multitude of neutral and selective forces that shape fungal pangenomes. It also comes with a suite of helper tools to analyse simulated populations.

Machine-learning approaches to study fungal pangenomes
Machine-learning approaches to study fungal pangenomes

I am developing machine-learning approaches to identify diversifying regions in fungal genomes that underpin pathogen adaptation. These approaches leverage the comprehensive representation of variation encoded in pangenome graphs to call mutations in core and accessory regions.

AllTheBacteria
AllTheBacteria

I am part of the AllTheBacteria project, which re-assembled >2 million bacterial genomes from INSDC. I led a team to generate the largest bacterial phylogenetic tree ever constructed, and constructed a collection of ~43 billion gene predictions and intergenic sequences for deep-learning model training

ALLCAPS - transformer-based pneumococcal serotyping
ALLCAPS - transformer-based pneumococcal serotyping

Alongside a talented intern, I am developing a transformer-based model for pneumococcal serotyping from genome sequence data. This model has been trained on a large dataset of pneumococcal genomes and can accurately and rapidly predict the serotype of a given genome. This work was funded by ISPPD-14.