Dr. Samuel T. Horsfield PhD SNSF Postdoctoral Fellow

Tools

GitHub links to open-source software and analysis tools, developed as part of my research in pangenomics and machine learning to study microbial epidemiology and evolution.

ggCaller
ggCaller Graph-based gene caller for pangenomes

A pangenome-graph-based gene caller for bacterial genomes.

ggCallaroo
ggCallaroo ggCaller + Panaroo + Bakta pipeline

A Snakemake pipeline combining ggCaller, Panaroo, and Bakta for pangenome analysis.

PanBART
PanBART Long-context encoder-decoder for pangenomes

A long-context encoder-decoder package for pangenomes.

PansimNuc
PansimNuc Nucleotide-level pangenome simulator

A nucleotide-level pangenome simulator written in Rust.

PopPUNK-mod
PopPUNK-mod Fitting models of pangenome evolution in bacteria

Scalable models of pangenome evolution in bacteria, fitting to PopPUNK data using Approximate Bayesian Computation.

Pansim
Pansim Bacterial pangenome simulator

A pangenome simulator for bacterial gene presence/absence.

CELEBRIMBOR
CELEBRIMBOR Pangenomes from metagenomes

A tool for pangenome analysis from metagenome assembled genomes (MAGs).

GNASTy
GNASTy Graph-based Nanopore Adaptive Sampling Typing

A pangenome graph-based tool for adaptive sampling in nanopore sequencing.

More

ExpEvoAnalyzer
ExpEvoAnalyzer Experimental evolution analysis workflow

A workflow to analyse experimental evolution data using read alignment, variant calling and annotation.

orthosynt
orthosynt Synteny-based orthologue analysis

A Python script to analyse Orthofinder results and determine orthologues based on synteny.

WTBcluster
WTBcluster Large-scale protein clustering workflow

A Snakemake workflow for clustering billions of proteins.

ATB tree
ATB_tree AllTheBacteria tree construction

A set of scripts for constructing trees from millions of genomes.

BacCorpusWF
BacCorpusWF Training data generation for Bacformer v2

Scripts for generating training datasets for Bacformer v2.

unitig caller
unitig_caller Counting unitigs in a de Bruijn graph

A tool for counting unitigs in a de Bruijn graph for microbial GWAS.

pangenome LLM
pangenome_LLM Language models for pangenomics

Scripts for applying large language models to pangenome analysis.

locus cutter
locus_cutter Target sequence cutting from genomes

A tool for identifying and cutting target sequences from a genome.

adaptive sampling scripts
adaptive_sampling_scripts Scripts for adaptive sampling in nanopore sequencing

Scripts to aid in analysis of adaptive sampling experiments in nanopore sequencing.