【佳學基因檢測】GWAS技術在基因檢測和基因解碼中的應用?
全基因組關聯(lián)研究
全基因組關聯(lián)(GWA)研究掃描了整個物種的基因組,以尋找多達數(shù)百萬個SNP與特定感興趣性狀之間的關聯(lián)。值得注意的是,感興趣的特征實際上可以是歸因于群體的任何類型的表型,無論是定性(如疾病狀態(tài))還是定量(如身高)?;旧?,給定p個SNP和n個樣本或個體,GWA分析將擬合p個獨立的單變量線性模型,每個模型基于n個樣本,使用每個SNP的基因型作為感興趣特征的預測因子。每個P檢驗中的關聯(lián)顯著性(P值)由相應SNP的系數(shù)估計β確定(從技術上講,關聯(lián)顯著性為P(eta | H_0:eta=0))。請注意,因為這些測試是獨立的,而且數(shù)量相當多,所以在建立并行GWA分析時有很大的計算優(yōu)勢。相當合理的是,有必要使用多種假設檢驗方法(如Bonferroni、Benjamini-Hochberg或錯誤發(fā)現(xiàn)率(FDR))調(diào)整產(chǎn)生的P值。GWA研究現(xiàn)在在許多不同物種的遺傳學中很常見。
Genome-wide association studies
Genome-wide association (GWA) studies scan an entire species genome for association between up to millions of SNPs and a given trait of interest. Notably, the trait of interest can be virtually any sort of phenotype ascribed to the population, be it qualitative (e.g. disease status) or quantitative (e.g. height). Essentially, given p SNPs and n samples or individuals, a GWA analysis will fit p independent univariate linear models, each based on n samples, using the genotype of each SNP as predictor of the trait of interest. The significance of association (P-value) in each of the p tests is determined from the coefficient estimate of the corresponding SNP (technically speaking, the significance of association is ). Note that because these tests are independent and quite numerous, there is a great computational advantage in setting up a parallelized GWA analysis (as we will do shortly). Quite reasonably, it is necessary to adjust the resulting P-values using multiple hypothesis testing methods such as Bonferroni, Benjamini-Hochberg or false discovery rate (FDR). GWA studies are now commonplace in genetics of many different species.
關聯(lián)映射與連鎖映射
通常,人們無法區(qū)分關聯(lián)和連鎖作圖或數(shù)量性狀位點(QTL)作圖之間的區(qū)別。盡管概念上相似,但它們的工作方式實際上是相反的。兩者之間的一個關鍵區(qū)別是關聯(lián)作圖依賴于無關個體的高密度SNP基因分型,而連鎖作圖依賴于受控育種實驗中顯著較少的標記分離——毫不奇怪,QTL作圖很少在人類中進行。重要的是,關聯(lián)作圖提供了基因組中的點關聯(lián),而連鎖作圖提供了QTL,即染色體區(qū)域。
本教程涵蓋了在進行GWA分析時要考慮的基本方面,從基因型和表型數(shù)據(jù)的預處理到結果的解釋。我們將使用316名中國人、印度人和馬來人的混合人群,賊近使用高通量SNP芯片測序、轉錄組學和脂質(zhì)組學對其進行了表征(Saw等人,2017年)。更具體地說,我們將尋找>250萬SNP標記與膽固醇水平之間的關聯(lián)。賊后,我們將使用USCS基因組瀏覽器探索候選SNP的附近,以獲得功能性見解。此處顯示的方法主要基于里德等人2015年概述的教程。R腳本和一些數(shù)據(jù)可以在我的存儲庫中找到,但是您仍然需要從這里下載omics數(shù)據(jù)。請遵循回購協(xié)議中的說明。
Association mapping vs. linkage mapping
Too often, people cannot tell the difference between association and linkage mapping, or quantitative trait loci (QTL) mapping. Albeit conceptually similar, their are actually opposite in their workings. One of the key differences between the two is that association mapping relies on high-density SNP genotyping of unrelated individuals, whereas linkage mapping relies on the segregation of substantially fewer markers in controlled breeding experiments – unsurprisingly QTL mapping is seldom conducted in humans. Importantly, association mapping gives you point associations in the genome, whereas linkage mapping gives you QTL, chromosomal regions.
The present tutorial covers fundamental aspects to consider when conducting GWA analysis, from the pre-processing of genotype and phenotype data to the interpretation of results. We will use a mixed population of 316 Chinese, Indian and Malay that was recently characterized using high-throughput SNP-chip sequencing, transcriptomics and lipidomics (Saw et al., 2017). More specifically, we will search for associations between the >2.5 million SNP markers and cholesterol levels. Finally, we will explore the vicinity of candidate SNPs using the USCS Genome Browser in order to gain functional insights. The methodology shown here is largely based on the tutorial outlined in Reed et al., 2015. The R scripts and some of the data can be found in my repository, but you will still need to download the omics data from here. Please follow the instructions in the repo.
(責任編輯:佳學基因)