All required data files are located in the GWAS_HW4.zip folder available on Sakai.
CB1908 is a new chemotherapeutic agent developed by EndC PharmaCorp. Unfortunately, one of the major side effects of the drug is thrombocytopenia (low platelet counts), which could lead to dangerous internal bleeding. Platelet counts have been measured in 922 individuals of European ancestry after treatment with CB1908. Perform a GWAS on the platelet count phenotype. Genotypes are in PLINK binary format files (bed/bim/fam, genome build hg19) and phenotypes are in the CB1908.PLT.counts.txt file.
Answer the following questions about your analysis and embed relevant code and plots:
- (1 pt) Is the CB1908 platelet count (“PLT”) phenotype normally distributed? If not, how did you adjust the phenotype prior to running your analysis? Show histograms and statistical tests to support your answer.
- (1 pt) Embed a screen shot of the PLINK command you used to run your GWAS below and explain what each option within the command does.
- (2 pts) Make a Q-Q plot and a Manhattan plot of your results. Embed the plots into this document and write a figure legend that explains what is represented in your plots.
- (1 pt) How many SNPs are genome-wide significant? List the SNPs here and explain how you defined genome-wide significance.
- (2 pts) What is known about your top SNP? Is it located in or near a gene? Have nearby genes been implicated in other GWAS? Cite databases and/or journal articles. Embed a LocusZoom plot to show the genomic context of your top SNP.
- (2 pts) Make a plot and figure legend of your top hit’s genotype vs. phenotype. Is the minor allele associated with increased or decreased platelet counts? Based on this result, are patients with the minor allele more likely or less likely to have thrombocytopenia?
- (1 pt) What analyses and experiments would you do next to follow-up on your findings?
BONUS (1 pt) Edit the qqmanplot.r code or R commands to change the colors of the points in the Manhattan plot and to remove the suggestive black line at p=10-4. Show your adjusted plot and code or R commands below.