New High Troughput Sequencing and Genotyping Method for Breeding Purposes
The present invention describes a new method of genotyping for use in plant breeding and research. It is based on sequencing partial regions of the genome (shallow depth) and filling in the missing data by imputing sequence data based on haplotype libraries. Software-assisted imputation leads to increased read coverage for the haplotype, allowing parallel genotyping of a very large number of variants for a very large number of individuals. Thus, this method outperforms array genotyping in terms of cost, simplicity, performance, and accuracy. Since the cost per data point is several times lower, the method has the potential to replace array-based genotyping in the next few years and possibly become the "gold standard" in plant breeding.
Complex but economically important traits such as yield or pest resistance are difficult to breed because they are influenced by several genes simultaneously. By selecting variants based on effective genotyping, it is hoped to achieve sharper selection and higher reliability in selecting suitable crossing partners. Currently, high-throughput genotyping is mainly performed using single nucleotide polymorphism (SNP) arrays. They are relatively easy to use and usually produce robust characterization with relatively few errors. As a result, they are commonly used for diversity analysis, genomic selection or genome-wide association studies. However, limitations of the technology include the complexity and cost of array design, their inability to type de novo polymorphisms, their lack of flexibility in the markers included and the cost of genotyping, which increases substantially with the number of SNPs on the array. In addition, SNPs are usually selected as array markers that are located in the conserved regions of the genome, i.e., by design they provide little information on structural variants.
The new method for genotyping is performed by sequencing the genome at a very shallow depth (1x instead of 30x) and supplementing the missing data by imputing sequence data based on haplotype blocks. The key aspect of the invention is to use a haplotype library, i.e., a collection of start and stop coordinates of genomic blocks as the basis for genotypic imputation. The procedure determines which individuals carry the same alleles of haplotypes and then merges their sequencing reads. This results in increased read coverage for the haplotype, allowing imputation of a missing data point for the entire population of individuals sharing the same haplotype. The innovative approach is to use a haplotype library to artificially increase read coverage within haplotype blocks. The accuracy and efficiency of this kind of genotyping is similar to that at high read depth and can be achieved using low coverage sequencing.
- Highly increased reading coverage
- Information on new structure variants
- Very easy implementation
The invention allows parallel genotyping of a very large number of variants for a large number of individuals sequenced at low coverage (e.g. less than 1x) and is therefore interesting for application in plant breeding. Although the method has been so far "only" applied to haploid plant genomes, the same approach is equally conceivable for non-haploid organisms by considering the respective haplotypes separately after appropriate phasing.
- Plant Breeding
- Animal Breding
The invention is ready for application.
Pook, Torsten et al.: Improving imputation quality in BEAGLE for crop and livestock data. Genes, Genomes, Genetics 10(1), S. 177-188
Pook, Torsten et al.: Haploblocker. Creation of subgroup specific haplotype blocks and libraries. Genetics, 1045-1061
Manager Patents and Licenses
Tel.: +49 551 30724 156