Skip to main content
Seminar | Data Science and Learning

Ophidia: A High-Performance Data Analytics Framework for eScience

DSL Seminar

Abstract: The Ophidia project is a research effort on big data analytics facing scientific data analysis challenges in the climate change domain. Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to deal efficiently with multidimensional data and a hierarchical data organization to manage large data volumes (“datacubes”). The background of the project relies on high-performance database management and OLAP systems to manage large scientific datasets.

The Ophidia analytics platform provides several data operators to manipulate datacubes and array-based primitives to perform data analysis on large scientific data arrays. Metadata management support is also provided. The server front-end exposes several interfaces to address interoperability requirements (i.e., OGC-WPS). From a programmatic point of view a Python module (PyOphidia) makes straightforward the integration of Ophidia into Python-based environments and applications (e.g., Jupyter Notebooks). The system also offers a CLI (e.g., bash-like) with a complete set of commands.

A key point of the talk will be the workflow capabilities offered by Ophidia. In this regard, the framework stack includes an internal workflow management system, which orchestrates and optimizes the execution of multiple scientific data analytics and visualization tasks. Specific macros are also available to implement loops or to parallelize them in case of data independence. Interactive workflows have been also implemented. Real-time workflow monitoring execution is also supported through a graphical user interface. Some real workflows implemented at CMCC in the context of EU projects will be also presented.

Bio: Sandro Fiore, Ph.D., is a senior scientist and head of the Data Science and Learning Research Team of the Advanced Scientific Computing Division at the Euro-Mediterranean Center on Climate Change (CMCC). His activity focuses on high-performance data management, big data analytics and mining, large-scale data warehouses, array databases, and in-memory analytics.