Cappello received his Ph.D. from the University of Paris XI in 1994 and joined CNRS, the French National Center for Scientific Research. In 2003, he joined INRIA, where he holds the position of permanent senior researcher. He initiated the Grid’5000 project in 2003 and served as Director of Grid’5000 (https://www.grid5000.fr) in its design, implementation and production phase from 2003 to 2008. Grid’5000 is still in used today and has helped hundreds or researchers for their experiments in parallel and distributed computing and to publish more than 1500 research publications. In 2009, Cappello also became visiting research professor at the University of Illinois. He created with Marc Snir the Joint-Laboratory on Petascale Computing that has developed in 2014 as the Joint laboratory on Extreme Scale Computing (JLESC: https://jlesc.github.io) gathering seven of the most prominent research and production centers in supercomputing: NCSA, Inria, ANL, BSC, JSC, Riken CCS and UTK. Over his ten years tenure as the director of the JLPC and JLESC, Cappello has helped hundreds of researchers and students to share their research and collaborate to explore the frontiers of supercomputing. From 2008, as a member of the executive committee of the International Exascale Software Project, he led the roadmap and strategy efforts for projects related to resilience at the extreme scale.
In 2016 Cappello became the director of two Exascale Computing Project (ECP: https://www.exascaleproject.org/) software projects related to resilience and lossy compression of scientific data that will help Exascale applications to run efficiently on Exascale systems. Through his 25 years of research career, Cappello has directed the development of several high impact software, including XtremWeb, one of the first Desktop Grid software, the MPICH-V fault tolerance MPI library, the VeloC multilevel checkpointing environment (https://github.com/ECP-VeloC), the SZ lossy compressor for scientific data (https://github.com/disheng222/SZ), Fault Tolerance Interface (https://github.com/leobago/fti) and the Z-Checker tool to assess the errors produced by lossy compressors (https://github.com/CODARcode/Z-checker).
He is an IEEE Fellow and the recipient of the 2018 IEEE TCPP Outstanding Service award.
High performance parallel and distributed computing
Resilience and fault tolerance at extreme scale