Hyeontaek Lim

Resource-Efficient Data-Intensive System Designs for High-Performance and Capacity Degree Type: Ph.D. in Computer Science
Advisor(s): David Andersen
Graduated: December 2015

Abstract:

Data-intensive systems are a critical building block for today's large-scale Internet services. These systems have enabled high throughput and capacity, reaching billions of requests per second for trillions of items in a single storage cluster. However, many systems are surprisingly inefficient; for instance, memcached, a widely-used in-memory key-value store system, handles 1–2 million requests per second on a modern server node, whereas an optimized software system could achieve over 70 million requests per second using the same hardware. Reducing such inefficiencies can improve the cost effectiveness of the systems significantly.

This dissertation shows that by leveraging modern hardware and exploiting workload characteristics, data-intensive storage systems that process a large amount of fine-grained data can achieve an order of magnitude higher performance and capacity than prior systems that are built for generic hardware and workloads. As examples, we present SILT and MICA, which are resource-efficient key-value stores for flash and memory. SILT provides flash-speed query processing and 5.7X higher capacity than the previous state-of-the-art system. It employs new memory-efficient indexing schemes including ECT that requires only 2.5 bits per item in memory, and a system cost model built upon new accurate and fast analytic primitives to find workload specific system configurations. MICA offers 4X higher throughput over the network than previous in-memory key-value store systems by performing efficient parallel request processing on multi-core processors and low-overhead request direction with modern network interface cards, and by using new key-value data structures designed for specific workload types.

Thesis Committee:
David G. Andersen (Chair)
Michael Kaminsky (Intel Labs)
Andrew Pavlo
Eddie Kohler (Harvard University)

Frank Pfenning, Head, Computer Science Department
Andrew W. Moore, Dean, School of Computer Science

Keywords:
Log Structure, Hash Table, Indexing, Concurrent Data Structure, I/O, Remote Procedure Call

CMU-CS-15-132.pdf (3.8 MB) ( 138 pages)
Copyright Notice