KMI: A Domain Specific Library for Extreme-Scale DNA Sequence Analysis
We propose K-mer Matching Interface (KMI), a domain specific library for extreme-scale DNA sequence analysis. It is designed to scale to very large size: petabytes of data across hundreds of thousands of commodity servers. The goal of KMI is to extract a common set of low level operations needed for k-mer matching so that they can be implemented and optimized on a wide variety of HPC hardware architectures. We define a programming model for overlap computation in genome assembly and genome mapping algorithms.
The programming mo del is easy-to-use, and provides both flexibility and portability for high-level DNA sequence analysis applications using KMI. We provide an efficient and scalable implementation of KMI in distributed memory systems. It is capable of effec- tively storing, indexing and searching tens of billions of DNA sequences through a set of well-defined APIs. Experiments show the query throughput of our distributed library with 2.98 TB random strings reaches 5.66*10^8 queries/s on 16,384 cores.
Huiwei Lu, currently a Ph.D. student from Institute of Computing Technology, Chinese Academy of Sciences. Will join MCS, Argonne National Laboratory as a postdoc in August 5, 2013