Building Smart Memories and Cloud Services with Derecho
Abstract: The Derecho platform was created to support a new form of high performance computing in support of Internet-of-Things applications, which often require some form of dynamical optimization that would run online (either using classic HPC methods, or with machine-learning components). In this talk, I suggest that as we scale them up, such applications deserve to be viewed as a new kind of smart memory. I use this term to refer to a customizable service designed to accept high-bandwidth data pipelines from sources, able to apply machine-learning tools to analyze and understand received content, and offering ways to query the resulting knowledge base with minimal delay. Such services would also need to scale out, yet must maintain their rapid responsiveness and strong consistency.
A good example that will be familiar to many high-performance computing (HPC) researchers include smart power grid systems that operate at the bulk level, to balance load and supply within large regions, and smart buildings that optimize power management for homeowners and landlords. Particularly interesting about this example is that it also illustrates a second need: gluing the smart memory "front ends" to more classic HPC back-end systems. Doing so would let us draw on the immense body of HPC tools for power systems analysis and optimization without reimplementing everything in a new model.
Derecho, which is now fully implemented, leverages persistent memory and RDMA to solve this problem with exceptional performance and scalability, much as MPI uses RDMA in standard HPC settings, but here much more of a focus on real-time response delays when the smart service receives new input from the outside world. Smart memory could be a major new opportunity for the HPC community, but our early experiments with the Derecho library are revealing a number of areas of the standard infrastructure that may need to evolve before this becomes a common and widely used option.
Bio: Ken Birman is the N. Rama Rao Professor of Computer Science at Cornell University. An ACM Fellow and the winner of the IEEE Tsutomu Kanai Award, Ken has written 3 textbooks and published more than 150 papers in prestigious journals and conferences. Software he developed operated the New York Stock Exchange for more than a decade without trading disruptions, and plays central roles in the French Air Traffic Control System and the U.S. Navy AEGIS warship. Other technologies from his group found their way into IBM’s Websphere product, Amazon’s EC2 and S3 systems, Microsoft’s cluster management solutions, and the U.S. Northeast bulk power grid. His Vsync system has become a widely used teaching tool for students learning to create secure, strongly consistent, and scalable cloud computing solutions. Derecho is intended for demanding settings such as the smart power grid, smart highways and homes, and scalable vision systems.