An Engineering Approach to Long Haul Data Transfers
File transfers over long fat networks (LFNs) offer several performance challenges with respect to pairing multiple sites with high performance storage and networking hardware. In particular, it is challenging to achieve high levels of data throughput performance while simultaneously accessing federated storage system components, multiple high performance network interfaces, complicated intra-node communication networks, and high-latency wide area networks.
Although performance studies of wide area networks have been a frequent topic of interest, performance analyses have tended to focus on network latency characteristics and peak throughput using network traffic generators. In this talk we instead focus on an end-to-end long distance networking analysis that includes reading large data sets from a source file system based on enterprise storage hardware and committing the data to a remote destination file system constructed of similar enterprise hardware. We will begin with a discussion of results related to scheduling storage system access to achieve peak read and write throughput with long haul networks, leveraging multiple network interfaces to improve host-level performance, and alleviating intra-host bottlenecks.
Our engineering approach then extends to include the development of analytical models that leverage differential regression to interpolate and extrapolate data transfer results from existing real and emulated networks, resulting in an effective modeling technique for predicting performance at various physical connection distances. Finally, we describe extensions to our differential regression technique to leverage refinements in our storage profiling models.