Abstract
The University of Central Florida and the University of Texas Health Science Center have developed technologies for genetic genealogical discovery in genotype databases:
- Invention 11227: Random Projection for IBD Detection (RaPID) software provides a fast, efficient method for detecting Identical-by-Descent (IBD) segments in a panel with phased haplotypes. While genetic relatedness, usually manifested as segments IBD, is ubiquitous in modern large biobanks, current IBD detection methods are not efficient at such a scale. RaPID achieves a time and space complexity linear to the input size and the number of reported IBDs. With simulation, the researchers showed that RaPID is orders of magnitude faster than existing methods while offering competitive power and accuracy. In UK Biobank, RaPID identified 3,335,807 >10cM IBDs among 223,507 male X-chromosomes in 11 minutes on a single core. RaPID is adaptable to data from different genotyping platforms with varying marker density and error rates.
- Invention 11570: The invention is a computer-based system and method for indexing, updating, and searching haplotypes for genetic genealogical discovery in genotype databases. It includes a pool of Positional Burrows-Wheeler transform (PBWT) genetic indexes, a haplotype ingestion engine, and a haplotype query engine. The haplotypes of individuals in a genotype database are indexed by a pool of multiple panels, each a PBWT data structure over a subset of markers of the original genotyping panel. Each panel pool can be updated by dynamic-PBWT insertion and deletion algorithms. The system identifies ancestral relationships by finding all long Identical-by-Descent (IBD) segments between a query and a panel, independent of the number of haplotypes.
A genetic genealogical query of a haplotype against a database starts by projecting the query onto a subset of panels in the pool, then conducting PBWT long match queries over each panel. The system then aggregates the identified long matches into IBD segments (DNA matches) between the query and the haplotypes in the database. System implementation includes storing the projected panels in persisted random-access media (for example, RAM or SSD).
Partnering Opportunity
The research team is looking for partners to develop the technology further for commercialization.
Stage of Development
Prototype available.
Benefit
Efficient, accurate, and cost-effective Supports dynamic updatesEnables fast online query searchesMarket Application
DNA testingPublications
Efficient
haplotype matching between a query and a panel for genealogical search,
Bioinformatics, Volume 35, Issue 14, July 2019, Pages i233–i241
RaPID:
ultra-fast, powerful, and accurate detection of segments identical by descent
(IBD) in biobank-scale cohorts. Genome Biology 20, 143 (2019)
Brochure