How do you managed petabytes to exabytes of data in a single coherent storage system? Can you do it using purely open source software running on commodity hardware? How do you manage data distribution, performance, and fault tolerance in a highly dynamic data center environment where node/disk failure and expansion are ongoing background activities?
Ceph is a scalable distributed storage system offering object, block device, and POSIX file system interfaces. This talk will provide an overview of the architecture, and then focus on the specific problem of distributing management of a file/directory hierarchy across many nodes in a reliable, adaptive, and performant fashion.
Biography:
Sage graduated from HMC in 2000 and completed his PhD at UC Santa Cruz in 2007, where Ceph was the basis for his thesis research. Since then he has continued working on the system with the aim of creating an open source distributed storage system that actually works and doesn’t cost a fortune.