Big Data, Big Metadata, Big Contexts
The increasing affordability and availability of data storage and processing power has already revolutionized what we store and how well we can discover it. A third revolution, in how well we can scale and automate data interpretation and management, awaits. Where these earlier revolutions were enabled by advances in our ability to handle bits, and descriptive information (metadata), respectively, the next one will require advances in how well we can manage contexts.
To explain the relevance of context and explore some of the emerging theory and practice of managing it, this talk will introduce the Sustainable Environments Actionable Data (SEAD) DataNet effort and its concepts of active and social data curation as well as recount some of the discussion that led to the inclusion of context-aware features in the W3C Provenance standard. A case will be made that context is at the heart of many of the thorniest issues related to ‘data complexity’, is central to automating the most labor-intensive aspects of scientific research, and will itself become a significant driver of data-intensive computing.
Dr. Myers is a Research Investigator with the University of Michigan’s School of Information. He received his B.A. in Physics from Cornell University (1985) and his Ph.D. in Chemistry from the University of California at Berkeley (1993). He has nearly two decades of experience in the development and deployment of Cyberinfrastructure for research, education, and industrial application and has participated in the planning and execution of multiple large community cyberinfrastructure projects for NSF, ONR, and DOE.
Dr. Myers has been active in the development of provenance and content management standards and has led efforts to design and develop data-intensive hardware and software systems. As a co-PI on the Sustainable Environment Actionable Data (SEAD) DataNet project. Dr. Myers is currently developing a scalable mechanism for the long-term preservation of scientific research results based on semantic web and cloud technologies and leveraging active and social curation approaches to increase data re-use and lower lifecycle costs.