End-to-end Optimization of Data Transfer Throughput over Wide-area High-speed Networks

Project Details

Project Lead
Tevfik Kosar 
Project Manager
Tevfik Kosar 
Project Members
Esma Yildirim, Engin Arslan, Jangyoung Kim, Ismail Alan, Kemal Guner, MD S Q Zulkar Nine  
Institution
University at Buffalo, Computer Science and Engineering  
Discipline
Computer Science (401) 

Abstract

Data produced by large-scale scientific applications has reached the amount of multiple petabytes/exabytes while in transfer speed we are able to achieve multiple gigabits per second due to the improvement of high-performance optical networking technology, which can support up to 100Gbps today. The transport layer protocols (e.g. TCP), that are highly popular, were not originally designed to cope with the capacity and speed of these types of networks currently available to the scientific community. Many alternative transport layer protocols have been designed to be suitable for high-speed networks, however they failed to replace the existing protocols. Moreover, to get transfer speeds of 100Gbps, the end-system capacities must also be taken into account apart from protocol improvements. The end-systems have evolved from single desktop computers to complex massively parallel multi-node clusters, supercomputers and multi-disk parallel storage systems. Additional level of parallelism by using multiple CPUs and parallel disk access are needed combined with the network protocol optimization to achieve high data transfer throughput. In this project, we develop models and algorithms in the application level that do network and end-system optimization to be able to utilize multiple Gbps bandwidth of high-speed networks by using the existing transport protocols without making any changes to the OS kernel. We claim that users should not have to change their existing protocols and tools to achieve high data transfer speeds. We achieve this in the application level via ‘end-to-end data-flow parallelism ‘in which we use parallel streams and stripes utilizing multiple CPUs and disks. It is very important to be able to utilize the network capacity without affect the existing traffic too much. Our prediction models can detect that level and only use minimal number of end-system resources to achieve that. We use very little information regarding network and end-systems by using previous transfer information or immediate samplings. We keep the overhead of the sampling to minimal through special techniques so that the overall gain in throughput will be much higher than this overhead included. We want to verify the feasibility and accuracy of the proposed prediction models by comparing to actual TCP data transfers over wide area networks. We would like to use several FutureGrid resources over wide area to validate our models.

Intellectual Merit

We will implement application-level models and algorithms to predict the best combination of protocol parameters for optimal end-to-end network throughput (i.e. num- ber of parallel data streams, TCP buffer size, I/O block size); integration of disk and CPU speed parameters into the performance model to predict the optimal number of disk (data striping) and CPU (parallelism) combinations for the best end-to-end performance; and development of an esti- mation service for advanced bandwidth reservations and provisioning. The developed models and algorithms will be implemented as a standalone service as well as being used in interaction with external data scheduling and management tools such as Stork, SRM, and Globus Online.

Broader Impacts

Developed models and algorithms will be made available via a web portal and supported by a proactive training program to ensure impact across all science communi- ties dealing with large amounts of data. We will be collaborating with TeraGrid, OSG, and DOSAR to make the services developed within this proposal available to a wide range of researchers across the nation as well as to the international community.

Scale of Use

We will need 1-2 nodes on each cluster for 1-2 days per experiment. We expect to perform 3-4 experiments per month.

Results

1 PRELIMINARY RESULTS

The End-to-end(E2E) data transfer tool is implemented by using the models and algorithms that we have proposed in our previous study [1] The tool is able to find the actual data transfer capacity of a high-speed network through optimization. The end-system capacities are taken into account and additional nodes of a cluster is used when necessary. The underlying transfer protocol used is GridFTP. The preliminary results we present in this report shows us the throughput gain between the clusters sierra(SDSC) and india(IU). 16 nodes are allocated for the transfer and only memory-to-memory transfer tests are conducted. In each case the data sampling size is increased between the range 2GB-16GB.

Figure 1, shows the throughput measurements of the data transfers that is done by the E2E data transfer tool. In the left figure, GridFTP parallel stream transfers are done with exponentially increasing numbers. When 8 streams is used, the tool’s algorithm decides that both source and destination node capacities are reached since the NIC cards on the nodes are only 1Gbps. The optimal stream number is calculated based on our prediction model. The predicted number in this case is 4. By using 4 streams on each node, the node number is increased exponentially which is shown in the figure to the right. In the x-axis, each label represents stream number per stripe, stripe number per source node, number of source nodes , stripe number per destination node and number of destination nodes (e.g. 4stm-1str-1n-1str-1n). After 16 nodes, the sampling stops because the throughput starts to decrease. The highest throughput obtained is around 3Gbps. The optimal stripe number calculated in this case is 6.

In Figure 2, the algorithm uses similar settings for 4GB sampling size, however the throughput results obtained with stripes are higher, since the sampling size is increased. The highest throughput is around 4Gbps and the optimal stripe number is 7. This value reaches around 6 Gbps for 8 and 16GB sampling sizes (Figure 3 and Figure 4 respectively). Without any optimization with GridFTP only 240Mbps data transfer speed is obtained while with our tool we increase this value to 6Gbps.

REFERENCES

[1] E. Yildirim and T.Kosar, ”End-to-End Data-flow Parallelism for Throughput Optimization in High-speed Networks” Proc. NDM’11(in conjunction with SC’11). 


Fig. 1. Sierra-India, Memory-to-Memory Transfers, 2GB sampling size    
Fig. 2. Sierra-India, Memory-to-Memory Transfers, 4GB sampling size    
Fig. 3. Sierra-India, Memory-to-Memory Transfers, 8GB sampling size    
Fig. 4. Sierra-India, Memory-to-Memory Transfers, 16GB sampling size