Algorithms for Coping with Silent Errors

Yves Robert, ENS Lyon and University of Tennessee, Knoxville
May 30, 2014 10:30AM to 11:30AM
Building 240, Room 4301
Silent errors have become a major problem for large-scaledistributed systems. Detection is hard, and correction is even harder. This talks presents generic algorithms to achieve both detection and correction of silent errors, by coupling verification mechanisms and checkpointing protocols. Application-specific techniques will also be investigated for sparse numerical linear algebra.

About the Presenter:
Yves Robert received his PhD degree from Institut National Polytechnique de Grenoble. He is currently a full professor in the Computer Science Laboratory LIP at ENS Lyon. He is the author of 7 books, 130+ papers published in international journals, and 200+ papers published in international conferences. He is the editor of 11 book proceedings and 13 journal special issues. He is the advisor of 26 PhD theses. His main research interests are scheduling techniques and resilient algorithms for large-scale platforms.

Yves Robert served on many editorial boards, including IEEE TPDS. He was the program chair of HiPC 2006 in Bangalore, IPDPS 2008 in Miami, ISPDC 2009 in Lisbon, ICPP 2013 in Lyon and HiPC 2013 in Bangalore. He is a Fellow of the IEEE. He has been elected a Senior Member of Institut Universitaire de France in 2007 and renewed in 2012. He has been awarded the 2014 IEEE TCSC Award for Excellence in Scalable Computing. He holds a Visiting Scientist position at the University of Tennessee, Knoxville since 2011.