主题：Pace Matching Data Access: A data access method on removing memory-wall effect
Dr. Xian-He Sun is a University Distinguished Professor of Computer Science of the Department of Computer Science at the Illinois Institute of Technology (IIT). He is the director of the Scalable Computing Software laboratory at IIT and a guest faculty in the Mathematics and Computer Science Division at the Argonne National Laboratory. Before joining IIT, he worked at DoE Ames National Laboratory, at ICASE, NASA Langley Research Center, at Louisiana State University, Baton Rouge, and was an ASEE fellow at Navy Research Laboratories. Dr. Sun is an IEEE fellow and is known for his memory-bounded speedup model, also called Sun-Ni’s Law, for scalable computing. His research interests include data-intensive high-performance computing, memory and I/O systems, software system for big data applications, and performance evaluation and optimization. He has over 250 publications and 5 patents in these areas. He is a former IEEE CS distinguished speaker, a former vice chair of the IEEE Technical Committee on Scalable Computing, the past chair of the Computer Science Department at IIT, and is serving and served on the editorial board of leading professional journals in the field of parallel processing. More information about Dr. Sun can be found at his web site www.cs.iit.edu/~sun/.
Computing has changed from compute-centric to data-centric. Data access becomes the main performance concern of computing. In this talk we introduce a series of fundamental results and their associated mechanisms to reevaluate memory systems, including I/O systems, and conduct data-centric memory optimizations. We first present the Concurrent-AMAT (C-AMAT) data access model to unify the impact of data locality, concurrency and overlapping. Then, we introduce the pace matching data-transfer design methodology to optimize memory system performance. Based on the pace matching design, a memory-computing hierarchy is built to generate and transfer the final results, and to mask the performance gap between computing and data transfer. C-AMAT is used to calculate the data transfer request/supply ratio at each memory layer, and a global control algorithm, named layered performance matching (LPM), is developed to match the data transfer at each memory layer and thus match the overall performance between the CPU and the underlying memory system. The holistic pace-matching optimization is very different from the conventional locality-based system optimization, and can minimize memory-wall effects to the minimum. Experimental testing confirms the theoretical findings, with a 150x reduction of memory stall time. We will present the concept of the pace matching data transfer, the design of C-AMAT and LPM, and some experimental case studies. We will also discuss optimization and research issues related to pace matching data transfer and of memory systems in general.