SSD performance benchmarking

Project Details

Project Lead
Sameer Tilak 
Project Manager
Sameer Tilak 
Institution
UCSD, Calit2, UCSD  
Discipline
Computer Science (401) 

Abstract

Solid-state drives (SSDs) are becoming cheaper and more common in Data Centers and we believe that this trend will continue to grow.  Current 6 GB/s SATA III NAND-based SSDs are delivering improved random I/O performance compared to traditional hard-disk drives. The goal of this project is to understand how big data technologies can benefit from SSDs. As the first effort, we are benchmarking Apache Hadoop. A key component of Apache Hadoop is the Hadoop Distributed File System (HDFS), a distributed file system that provides high-throughput access to application data. We will first compare HDFS I/O throughput (Mbps) for SSDs and HDDs. Next, we will investigate the impact of SSDs on virtualization.

Intellectual Merit

This project will help us understand how big data technologies and virtualization can benefit from SSDs.

Broader Impacts

Solid-state (Flash) drives are becoming cheaper and more common in data centers and we believe that this trend will continue to grow. By 2020, the quantity of electronically stored data will reach 35 trillion gigabytes. Quantifying the impact of SSDs on virtualization and big data technologies will help us to improve the performance and energy-efficiency of data centers.

Scale of Use

Lima cluster for next few months.