SSD performance benchmarking
Solid-state drives (SSDs) are becoming cheaper and more common in Data Centers and we believe that this trend will continue to grow. Current 6 GB/s SATA III NAND-based SSDs are delivering improved random I/O performance compared to traditional hard-disk drives. The goal of this project is to understand how big data technologies can benefit from SSDs. As the first effort, we are benchmarking Apache Hadoop. A key component of Apache Hadoop is the Hadoop Distributed File System (HDFS), a distributed file system that provides high-throughput access to application data. We will first compare HDFS I/O throughput (Mbps) for SSDs and HDDs. Next, we will investigate the impact of SSDs on virtualization.
Use of FutureSystems
Lima is a FutureGrid cluster at SDSC that consists of 8 nodes equipped with 480 GB SSD SATA drives. We will use Lima for conducting this research.
Scale of Use
Lima cluster for next few months.