Streaming in the Clouds
In the recent years BigData has become an important aspect of scientific discoveries - a process referred to as the Forth Paradigm. From the wide spectrum of applications and acquisitions methods, the ones that will generate the biggest amounts of data fall in the category of streaming data, i.e., networks of sensors, observatories, telescopes or experiments such as CERN LHC. As the amount of acquired information grows and the location of data sources are increasingly geographically distributed, it becomes important to process the data in scalable and efficient ways. Cloud computing presents an interesting option for a scalable processing platform. However, the question arises how to best use cloud computing capabilities for geographically distributed stream processing. In this work, we explore and analyze different approaches to streaming data to the cloud and evaluate them in the context of multiple cloud offerings including Microsoft Azure, and and FutureGrid's Nimbus and OpenStack installations. We show, using an ATLAS application, that using the right approach to streaming data can improve the average data rates three times.
Use of FutureSystems
FutureGrid is used for running the Virtual machines in which the stream processing will be performed. The purpose is to understand how such data can be processed in cloud environments.
Scale of Use
The number of VMs used are in the order of tens up to hundred. As the goal is to understand how BigData streaming is supported at large scale, scalability in terms of number of nodes/ VMs is important.