Abstract: This talk will explore the interplay between DNA and big data.
In the first part, I will talk about storing data in DNA. DNA is an attractive medium to store digital information and I will report our method to record information using a strategy called DNA Fountain. Using our approach, we stored 2.14 × 106 bytes in DNA oligonucleotides and perfectly retrieved the information from a sequencing coverage equivalent to a single tile of Illumina sequencing. We also explored the limit of our architecture in terms of bytes per molecule and obtained a perfect retrieval from a density of 215 petabytes per gram of DNA, orders of magnitude higher than other storage devices. Finally, we developed a new storage architecture called the “DNA of things” that allows to turn everyday objects, such as glasses, into storage devices.
In the second part, I will present a new study about the limitation of genetic privacy. Consumer genomics databases have reached the scale of millions of individuals. Recently, law enforcement authorities have exploited some of these databases to identify suspects via distant familial relatives. Using genomic data of 1.28 million individuals tested with consumer genomics, we investigated the power of this technique. We project that over 60% of the searches for individuals of European-descent will result in a third cousin or closer match, which can allow their identification using demographic identifiers. Moreover, the technique could implicate nearly any US-individual of European-descent in the near future. We demonstrate that the technique can also identify research participants of a public sequencing project. Based on these results, we propose a potential mitigation strategy and policy implications to human subject research.