Researchers frequently rely on large-scale and domain-specific workflows to conduct their science. These workflows may integrate a variety of independent software functions and external applications. However, developing and executing such workflows can be difficult, requiring complex orchestration and management of applications and data as well as customization for specific execution environments.
Parsl (Parallel Scripting Library), a Python library for programming and executing data-oriented workflows in parallel, addresses these problems. Developers simply annotate a Python script with Parsl directives; Parsl manages the execution of the script on clusters, clouds, grids, and other resources. Parsl orchestrates required data movement and manages the execution of Python functions and external applications in parallel.