Computing performance growth has affected nearly every aspect of our lives, from social interactions to business and agriculture. In scientific discovery, both simulation and data analysis on more powerful computers will help us understand the impacts and develop mitigations for climate change, design new medical treatments and community responses, and revolutionize the scientific process. But single processor clock speed scaling ended a decade ago, and transistor sizes will approach atomic scales in the next decade. Future computing system designs will be constrained by power density and total system energy, and the two performance approaches are likely to be parallelism and specialization.
Data movement already dominates running time and energy costs, making communication cost reduction the primary optimization criteria for compilers and programmers. This requires not only new algorithms and programming models, but new ways of thinking about computation to minimize and hide communication, expose fine-grained parallelism, and manage new hardware features. These changes will affect the theoretical models of computing, the analysis of performance, the design of algorithms, and the practice of programming in fundamental ways.
In this talk I will discuss prior work an open problems in optimizing communication costs through overlap, caching and aggregation. Bandwidth reduction often requires more substantial algorithmic transformations, although some techniques, such as loop tiling, are amenable compiler analysis or “autotuning” techniques. More sophisticated algorithmic ideas for communication avoidance have arisen in the communication-optimal algorithms, including the “.5D algorithms.” These ideas are applicable to many domains, from scientific computations to database operations. I will end by describing some recent work that uses advanced algorithms, programming models, and systems to attack previously unsolvable problems in science, namely the assembly of large complex genome datasets.