High-performance modeling and simulation are playing a driving role in decision making and prediction. For time-critical emergency support applications such as severe weather prediction, flood modeling, and influenza modeling, late results can be useless. Computer models must be run and the data analyzed while their predictions can still be applied. These on-demand large-scale computations can\‘t wait endlessly in a job queue for supercomputer resources to become available. Neither can the community keep multimillion-dollar infrastructures idle until required by urgent computation. A specialized infrastructure is needed to provide computing resources quickly,automatically, and reliably. SPRUCE is a system to support urgent or event-driven computing on both traditional supercomputers and distributed Grids. Scientists are provided with transferable Right-of-Way tokens with varying urgency levels. During an emergency, a token has to be activated at the SPRUCE portal, and jobs can then request urgent access. Local policies dictate the response, which may include providing \“next-to-run\” status or immediately preempting other jobs. Additional components under development include a periodic testing mechanism of applications in Warm-Standby mode ensuring readiness and an automated Advisor that helps find the best resource to submit based on deadline, queue status, site policy, and warm-standby history.
Supporting urgent or event-driven computing on both traditional supercomputers and distributed grids