DECA (Distributed Exome CNV Analyzer) 0.20 was released today as part of the ADAM 0.23 release. DECA is a distributed implementation of the XHMM CNV calling algorithm using ADAM and Apache Spark that achieves an order of magnitude speedup on multicore workstations and Hadoop clusters. Starting from the target read depths, DECA can call CNVS in all 2535 1000 Genomes Phase 3 exomes samples in 31 minutes or less. Starting from the aligned BAM files, DECA can call CNVS in the Phase 3 exome cohort in less than 5 hours on a 56-node Hadoop cluster. Interested users are pointed to the DECA documentation and the Hacker News discussion.
DECA began as a Davin Chia’s ‘16.5 Middlebury College Computer Science undergraduate senior thesis project and was further developed by Middlebury CS student Forrest Wallace ‘17 in collaboration with Frank Nothaft at the UC Berkeley AMPLab.