Two research projects conducted with Middlebury undergraduate students were published in the fall of 2019! The paper “DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark”, describes a distributed reimplementation of the XHMM CNV calling algorithm. DECA began as a Davin Chia’s ‘16.5 Middlebury College Computer Science undergraduate senior thesis project and was further developed by Middlebury CS student Forrest Wallace ‘17 in collaboration with Frank Nothaft at the UC Berkeley AMPLab (and now DataBricks). The paper “MySeq: privacy-protecting browser-based personal Genome analysis for genomics education and exploration” describes a web application for personal genome analysis that can efficiently query whole-genome-scale VCF files. MySeq was designed for educational use and has been used in the CSCI1007 “Practical Analysis of a Personal Genome” winter-term course at Middlebury College. MySeq was developed with Middlebury CS students Leo McElroy ‘18 and Laura Chang ‘20.
The PAPG team reported the impacts of incorporating personal whole genome sequencing into “Practical Analysis of Your Personal Genome” (PAPG), a novel laboratory-style medical genomics in which students have the opportunity to sequence their own genome. The paper, “Impacts of incorporating personal genome sequencing into graduate genomics education: a longitudinal study over three course years” in BMC Medical Genomics, describes student attitudes towards genome sequencing, decision-making, psychological wellbeing, genomics knowledge and pedagogical engagement across the 2013-2015 iterations of the course (a total of 59 students). This paper expands upon previous reports on the initial 19 PAPG students.
DECA (Distributed Exome CNV Analyzer) 0.20 was released today as part of the ADAM 0.23 release. DECA is a distributed implementation of the XHMM CNV calling algorithm using ADAM and Apache Spark that achieves an order of magnitude speedup on multicore workstations and Hadoop clusters. Starting from the target read depths, DECA can call CNVS in all 2535 1000 Genomes Phase 3 exomes samples in 31 minutes or less. Starting from the aligned BAM files, DECA can call CNVS in the Phase 3 exome cohort in less than 5 hours on a 56-node Hadoop cluster. Interested users are pointed to the DECA documentation and the Hacker News discussion.
Our Practical Analysis of Your Personal Genome (PAPG) laboratory-style genomics course was recently described in an article in AAMCNews “Medical Students Learn From Analyzing Their Own Genetic Makeup”.
The HealthSeq team reported the initial outcomes data from their longitudinal study of ostensibly healthy adults who underwent whole genome sequencing (WGS). The paper, “Psychological and Behavioural Impact of Returning Personal Results from Whole-GenomeSsequencing: The HealthSeq Project.” in the European Journal of Human Genetics, described outcomes one week and 6 months after participants received their genomic results. This paper complements earlier publications on participant’s motivations, concerns and preferences and the impact of genomic counseling on informed decision-making by HealthSeq participants.