Biotechnology-Aquaculture Interface:

The Site of Maximum Impact Workshop

Contents
-Home
-Welcome Letter
-Purpose of Workshop
-Program
-Presentations

Appendix
-Participants
-Steering Committee
-Program Committee

Workshop Report
-Preface
-Final Report

Quantitative Genetics and Genomics: Lessons from Dairy Cattle

Quantitative Genetics and Genomics: Lessons from Dairy Cattle

Curtis P. Van Tassell, Tad S. Sonstegard, Melissa S. Ashwell, and Erin E. Connor

USDA-ARS

Animal Improvement Programs Laboratory and

Gene Evaluation and Mapping Laboratory

Building 200, Room 2, Beltsville, MD 20705

curtvt@aipl.arsusda.gov

Abstract

A number of livestock species have had substantial successes in genetic improvement. In particular, the dairy cattle industry has a well-established infrastructure that has resulted in impressive rates of genetic improvement. These results have been achieved primarily through quantitative genetics programs. Extensive data have been collected and used to characterize traits and to calculate predicted genetic values. Trait characteristics were used to design breeding programs and predicted genetic values have been used as a basis for selection. To accelerate this program in dairy cattle quantitative trait locus (QTL) mapping efforts have been initiated by a number of organizations internationally. The use of artificial insemination has resulted in very large half-sib families useful for QTL mapping using a "daughter design" and large numbers of sires with numerous sons each having large numbers of offspring that facilitate the use of a "granddaughter design." These large half-sib families are similar to family structures in aquaculture populations. To date, a number of QTL have been identified and are being further characterized for use in marker-assisted selection. Work is progressing rapidly towards identification of additional QTL using a new resource population. Many of the tools and techniques used in these efforts are directly applicable to aquaculture.

Key Words - quantitative genetics, predicted breeding values, quantitative trait locus

Introduction

Implementation of a formal milk-recording program started in 1908 for gathering management information on dairy cattle. It quickly became apparent that these data were useful for other purposes as well. Using these data, USDA calculated the first sire evaluation in 1936. The Animal Improvement Programs Laboratory currently calculates national genetic evaluations four times each year for nearly 23 million dairy cattle using approximately 68 million records. The first commercial artificial insemination (AI) organization was started in 1938, and approximately 70% of the 9 million cows are currently bred using AI sires. The current genetic trend for milk yield is approximately 260 lb/yr or nearly 20% of an additive genetic standard deviation per year. Part of the success in dairy cattle genetic improvement has been the extensive characterization of variation – including additive genetic variation, dominance genetic variation, and permanent environmental variation. In addition, statistical model improvements have been implemented to reduce residual variation in an effort to increase effective heritability. As an example, the most recent effort has been to evaluate milk, fat, and protein yields based on daily milk records rather than on a total lactation basis. This project has required building a database of over 220 million observations on 26.5 million lactations. The ultimate goal is to develop a genetic evaluation system built on data with reduced environmental variation that will further accelerate the rate of genetic improvement in the industry.

The extensive use of AI has resulted in very large half-sib families useful for QTL mapping using a daughter design (Figure 1) and large numbers of sires with numerous sons each with large numbers of offspring that facilitate the use of a granddaughter design (Figure 2) (Weller et al., 1990). Elite bulls have tens of thousands of daughters with recorded production and dozens of sons. These large half-sib families are similar to family structures in aquaculture populations, although in fish populations there are likely to be nested families of full sibs.

Progress in dairy cattle genomics has depended on a dense map of genetic markers developed by a number of groups, including the USDA-ARS U.S. Meat Animal Research Center (Kappes et al., 1997). To date, nearly 1700 publicly available microsatellite markers have been placed on the MARC linkage map. A number of QTL have been identified using these genetic markers, and work is progressing rapidly on identification of further QTL. One major limitation has been recording of relevant phenotypes. Because yield traits have been routinely collected for management purposes, high rates of genetic progress have been achieved for those traits. However, improvement for other traits has languished because of a lack of field data. Industry cooperators have increasingly identified these traits as a high priority for application of genetic markers for several reasons. Some of these traits are lowly heritable, and therefore require a large number of observations for accurate genetic prediction. Many of the traits are also difficult or expensive to collect. Identification of QTL followed by marker-assisted selection is well suited for these situations.

Methods and Materials

Quantitative GeneticsPrediction of genetic values has been a fundamental challenge of animal breeders for generations. Statistical methodology had traditionally dealt only with fixed effects, which limited the ability to utilized information on genetically related animals. Dr. C. R. Henderson made a major breakthrough in the development of the statistical framework (best linear unbiased prediction – BLUP) to predict genetic values accounting for relationships among animals (Henderson, 1975, 1976). To obtain predicted genetic values, a complex system of equations is constructed (often called Henderson’s mixed model equations in his honor) that require knowledge of variance components (at least heritability) for a trait. Estimation of these parameters has been a major research effort in animal breeding for over 30 years. Increased computing power has made analysis using more sophisticated procedures possible, including restricted maximum likelihood (REML; Patterson and Thompson, 1971) and Bayesian analysis. Furthermore, a number of software packages have been developed to do genetic prediction and parameter estimation using state-of-the-art methods. These packages generally allow for very complex models, missing data, and joint analysis of multiple traits. Examples include MTDFREML (http://aipl.arsusda.gov/curtvt/mtdfreml.html), VCE (http://www.tzv.fal.de/~eg/), MTGSAM (http://aipl.arsusda.gov/curtvt/mtgsam.html), and MTDFS (ftp://nce.ads.uga.edu/pub/ignacy/). QTL MappingFor the duration of this discussion, it will be assumed that the best phenotype for a given trait is a predicted breeding value or "daughter yield deviation" (DYD; VanRaden and Wiggans, 1991). Predicted genetic merit is often available or can be calculated using phenotypes, known pedigree information, and genetic parameters. The use of DYD is ideal because the regression associated with genetic prediction is removed, however, the phenotypes are adjusted for systematic effects (e.g., environmental effects, age, etc.), reducing residual variance and increasing the power of statistical tests. To account for selection of mates, one-half of the dam predicted genetic merit should be subtracted from the DYD or predicted genetic merit for each offspring. In addition, it will be assumed that the traits of economic importance are quantitative in nature. There are exceptions to this assumption (e.g., litter size in pigs, birthing difficulty in cattle, twinning rate in cattle or sheep), but in most cases these traits are ideally analyzed with a threshold model, where an underlying normal distribution is assumed and resulting predicted genetic merit or DYD values are then on a continuous scale.Two major strategies have been used to identify QTL in dairy cattle. They are genome wide scan and candidate gene approaches. The genome wide scan approach is a systematic approach to investigating the complete genome for evidence of regions containing an association with the trait under investigation. Usually, a set of genetic markers is selected that are well-distributed throughout the genome. After generating marker genotypes, these data are combined with phenotypic data and are analyzed for marker effects within each heterozygous grandsire family using single-trait analysis implemented by a simple statistical package such as SAS. A possible statistical model is:

Yijkl = Sij + Mijk + eijkl ,

where Yijkl is the adjusted DYD, predicted genetic merit, or phenotype for trait i of offspring l of sire j that inherited marker allele k; Sij is the effect of sire j for trait i (excluding effects linked to the marker investigated); Mijk is the effect of marker allele k of sire j on trait I; and eijkl is the random residual. Both within and across family analyses can be conducted. Within family analyses correspond to a single degree of freedom contrast of M effects for the two alleles segregating in sire j, while the across family analysis tests for the evidence of different M effects across families. Note that marker effects are nested within sire family so that no assumptions are made regarding phase of QTL allelic effects in the across family analysis. In this analysis, a significant marker effect is taken to indicate the presence of one or more closely linked QTL. See Van Tassell et al. (2000) for an example with dairy QTL mapping. This analysis can be extended to include a dam effect nested within sire to account for full-sib families.

Two main approaches are used in implementing a genome scan: complete genotyping and selective genotyping. The method of choice depends on a number of factors: the number of traits of economic importance, the cost of phenotyping and genotyping, and the population size available for study. A selective genotyping strategy is used when relatively few traits are of interest, phenotyping is relatively cheap, and large populations are available. Under this scenario, the individuals with the most extreme phenotypes in a family are genotyped and differences in allele frequency are tested for marker allele associations with phenotypic differences. Some researchers have used DNA pooling as another route to identify allele frequency differences in which equal amounts of DNA from each animal are combined and genotype intensities are used to infer differences in allele frequencies in groups. Based on evidence from results from selective genotyping, additional animals can be genotyped in the family where the effect has been identified or in other families as independent verification. Because of the need for validation, it is essential that phenotypes and DNA be collected whenever possible!

In situations where there are a number of traits of interest, it is often easiest to completely genotype the animals in a family and then analyze all traits using genotypic data to identify marker-phenotype associations. This is particularly true when traits are uncorrelated or negatively correlated, as there will be little commonality in extreme phenotype groups. This is the approach we have taken in the dairy cattle QTL mapping project because of the large number of traits of economic importance and the limited family size.

Candidate gene approaches have been used to test specific genes for association with phenotypic differences. Examples include growth hormone and its receptor for milk production and growth rate, major histocompatibility complex genes with disease resistance, and casein for milk production. In many cases these types of studies are unsuccessful, possibly reflecting our insufficient understanding of the complexities of the physiology of complex traits. Once regions have been identified based on genome scan data, positional candidate genes may be identified. Comparative map information can be used in this region to refine searches for the causative effect. It is important to remember that based on results from the human genome project, even a "small" region of the genome may contain a large number of genes, many without known function. Fortunately, it is generally unnecessary to identify the exact cause of genetic differences to utilize QTL in marker-assisted selection (MAS) programs, as loss of efficiency is extremely small when informative markers closely flank a QTL. Understanding the basic biology of complex traits, however, is certainly furthered by identifying the exact cause of phenotypic differences.

The results from marker association analyses simply identify regions of the genome where QTL may exist. Because each marker is analyzed individually, the location and magnitude of the QTL effects are confounded; a QTL with small allelic differences close to a marker and a QTL with large allelic differences further from that marker can generate the same statistical evidence or "signal." Interval analysis allows this confounding to be broken and estimates location and allelic differences for a QTL when a number of markers in a region have been genotyped (Haley and Knott, 1992). This step is essential in characterizing the QTL before implementing a MAS program. This process should be extended to additional families so that a large fraction of the alleles segregating in the population can be characterized before implementing MAS.

Functional Genomics

Functional genomics provides a window into basic biology that is unprecedented. There is a fundamental shift in objectives when changing from QTL mapping into functional genomics. The objective becomes identifying genes for some purpose that may or may not include genetic improvement such as developing pharmaceuticals or therapeutics, utilization of transgenics, or understanding the basic biology. In QTL mapping the goal is typically to accelerate genetic improvement.

Our group has initiated a large sequencing effort with the goal of generating a large number of expressed sequence tags (EST). To date, over 16,500 sequences have been generated from a cDNA library derived from the mammary gland sampled at a number of physiological states. This resource will be used to develop expression arrays that will be extremely useful in providing fundamental insight into the physiology and function of the bovine mammary gland.

This project highlights the importance of a team approach to this highly complex and expensive research. The team in our laboratory includes a number of scientists. A physiologist is essential to the success of this experiment: in identifying appropriate physiological states to sample so that a complete cross section of expressed genes may be sampled, in understanding and characterizing genes expressed in the gland, and in designing experiments to utilize expression arrays resulting from this effort. A molecular geneticist is a critical contributor to ensure that appropriate methodologies are followed for RNA extraction, to manage creation of the cDNA library, and, most importantly, to oversee and optimize the process of high-throughput sequencing. A quantitative geneticist is essential to develop statistical tools to analyze sequence information and expression data. A bioinformatician brings important tools to such a complex project, such as database development, data processing (vector masking, quality scoring, etc.), and similarity searching (i.e., BLAST).

Results

Genetic variation has been very well-characterized for production and conformation traits in dairy cattle. Additive genetic, dominance genetic, and permanent environmental fractions of variance have been estimated for most economically important traits. Genetic evaluations are routinely calculated (4x/yr for most traits) for these traits. These results are eagerly anticipated by AI organizations and the producers that use them in making selection decisions.

To date, a total of 155 microsatellite markers have been genotyped in eight large Holstein sire families. Effects of marker alleles were analyzed for 38 traits including traits for milk production, health, and conformation. Permutation tests were used to calculate empirical trait-wise error rates. A total of 25 significant within family marker-trait relationships were identified and 29 significant relationships were identified across families.

Two regions identified in previous results from the genome scan project have been selected as targets for fine mapping. The first is a region on chromosome 27 where a QTL appears to impact "dairy form," a trait that can be envisioned as the "fatness" of a cow. This trait is known to be highly (and adversely) correlated with milk production, i.e., as milk production increases, the fat deposition is typically reduced. Dairy form is also known to be associated with several metabolic diseases, so as fat deposition decreases, the rate of metabolic disorders increases. Our results from one family investigated indicate that the QTL identified may allow change in dairy form without altering milk production, i.e., to possibly reduce disease incidence without altering productivity. Fine mapping of that QTL is continuing with genotyping being done on an extended family (sons, grandsons, and great-grandsons of the original sire). The second region is on chromosome 6, where previous results indicated a QTL affecting protein percentage of milk. Similar fine mapping efforts are proceeding on that chromosome in two related families.

Conclusions

Quantitative genetics has been tremendously effective in creating the modern dairy cow. Genetic improvement in dairy cattle is approaching the level thought theoretically possible based on population structures, selection intensity, and trait characteristics. This result has been possible because of the level of trait characterization that has been completed by geneticists working in the field. Genomics will help accelerate that rate of genetic improvement to new levels, especially for lowly heritable traits.

 

Recommendations

Short-term (1-3 years):

1. Identify or develop resource populations for QTL mapping in the species of interest.

2. Collect phenotype and pedigree records for traits of economic interest for relevant populations. Carefully characterize the components of variation to determine the utility of genomics in advancing genetic improvement. Begin quantitative selection.

3. Assemble researchers into teams appropriate for species and traits of interest. Critical team members include people with expertise in quantitative genetics and statistics, molecular biology, physiology, and bioinformatics.

4. Identify available resources for QTL mapping (genetic markers, linkage map, etc.). Identify areas where additional resources are needed and target those areas.

Medium to Long term (3+ years):

5. Conduct genome scans for traits of economic importance, especially for those where quantitative selection will be least effective (lowly heritable traits) and traits with greatest economic impact.

6. Develop or identify resources for fine mapping, possibly including EST sequences, physical map (BAC library), and comparative map information.

7. Fine-map identified QTL and incorporate marker-assisted selection into existing selection programs.

8. Investigate feasibility of EST project and associated gene expression studies.

9. Investigate feasibility of sequencing genome.

References

Haley, C. S. and S. A. Knott. 1992. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324.

Henderson, C. R. 1975. Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423-447.

Henderson, C. R. 1976. A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32:69-83.

Kappes, S. M., J. W. Keele, R. T. Stone, R. A. McGraw, T. S. Sonstegard, T. P. Smith, N. L. Lopez-Corrales, and C. W. Beattie. 1997. A second-generation linkage map of the bovine genome. Genome Res. 7:235-249.

Patterson, H. D. and R. Thompson. 1971. Recovery of inter-block information when block sizes are unequal. Biometrika 58:545-554.

VanRaden, P. M. and G. R. Wiggans. 1991. Derivation, calculation, and use of national animal model information. J. Dairy Sci. 74:2737-2746.

Van Tassell, C. P., M. S. Ashwell, and T. S. Sonstegard. 2000. Detection of putative loci affecting milk, health, and conformation traits in a US Holstein population using 105 microsatellite markers. J. Dairy Sci. 83:1865-1872.

Weller, J. I., Y. Kashi, and M. Soller. 1990. Power of daughter and granddaughter designs for determining linkage between marker loci and quantitative trait loci in dairy cattle. J. Dairy Sci. 73:2525-2537.

Figure 1. "Progeny" design with the color of the animal indicating the allele inherited from the heterozygous parent. DNA is collected on parents and progeny and phenotypes collected on progeny.

Figure 2. Granddaughter design with the color of the animal indicating the allele inherited from the heterozygous sire, with black grand-progeny inheriting neither of the grandpaternal alleles. DNA is collected only on sire and progeny and phenotypes collected on grandprogeny.