Parallel Data Laboratory Talk - Matt Butrovich July 24, 2024 12:00pm — 1:00pm Location: Virtual Presentation - ET - Remote Access - Zoom Speaker: MATT BUTROVICH, Post-Doctoral Researcher, Computer Science Department, Carnegie Mellon University https://mattbutrovi.ch/ On Embedding Database Management System Logic in Operating Systems via Restricted Programming Environments The rise in computer storage and network performance means that disk I/O and network communication are often no longer bottlenecks in database management systems (DBMSs). Instead, the overheads associated with operating system (OS) services (e.g., system calls, thread scheduling, and data movement from kernel-space) limit query processing responsiveness. User-space applications can elide these overheads with a kernel-bypass design. However, extracting benefits from kernel-bypass frameworks is challenging, and the libraries are incompatible with standard deployment and debugging tools. This talk presents an alternative in user-bypass: a design that extends OS behavior for DBMS-specific features, including observability, networking, and query execution. Historically, DBMS developers avoid kernel extensions for safety and security reasons, but recent improvements in OS extensibility present new opportunities. With user-bypass, developers write safe, event-driven programs to push DBMS logic into the kernel and avoid user-space overheads. There are two ways to to invoke user-bypass logic: (1) when a DBMS in user-space invokes these programs, user-bypass provides behavior similar to a new OS system call, albeit without kernel modifications. In contrast, (2) when an OS thread or interrupt triggers these programs in kernel-space, user-bypass inserts DBMS logic into the kernel stack. First, we present a framework that employs user-bypass to collect training data for self-driving DBMSs efficiently. User-bypass programs reduce the number of round trips to kernel-space to retrieve performance counters and other system metrics. Next, we present a database proxy that applies user-bypass to support features like connection pooling and workload replication while reducing data copying and user-space thread scheduling. User-bypass programs embed DBMS network protocol logic in multiple layers of the OS network stack, applying DBMS proxy logic in a kernel-space fast path. Lastly, we present an embedded DBMS for future user-bypass applications. We discuss the design decisions, environment challenges, and performance characteristics of a DBMS that offers ACID transactions over multi-versioned data in kernel-space. We also explore applications of this user-bypass DBMS and compare them to modern user-space systems. The techniques proposed in this talk show user-bypass benefits across multiple DBMS design disciplines and provide a template for future DBMS and OS co-design. — Matt Butrovich is a recent Ph.D. graduate (now postdoctoral researcher) from the Computer Science Department at Carnegie Mellon University, researching database management systems (DBMSs). He explores operating system and DBMS co-design opportunities via safe kernel extension mechanisms like eBPF. He is incredibly fortunate to be advised by Andy Pavlo, and is a member of the Database Group (CMU-DB), Parallel Data Lab (PDL), and Systems, Networking, and Performance (SNAP) Lab. Zoom Participation. See announcement. Event Website: https://pdl.cmu.edu/talk-series/2024/072424.shtml