Workshop 5 | Symposium on VLSI Technology and Circuits

* If you encounter menus do not work upon clicking, delete your browser's cache.

Uniform and Rigorous Benchmarking of Machine Learning ICs and Systems

Organizer : Naresh Shanbhag (University of Illinois at Urbana-Champaign)

Artificial Intelligence is transforming our society as we speak in diverse ways. Integrated circuits (ICs) and systems, both hardware and software, form the core of this transformation. As societal scale systems such as health care, transportation, environmental/climate monitoring, energy, education, and others become AI-enabled, it is critical that machine learning (ML) ICs and systems delivered for large-scale deployment by our community are not just energy efficient and high throughput but also accurate, robust, secure, and predictable. Proper and rigorous benchmarking methodologies for ML ICs is a compelling and current need to enable our community to measure and gauge progress in this exciting area and deliver ML systems with guarantees on its behavior, efficiency, and performance. Benchmarking of ML ICs and systems is made challenging by the unique data-centric cross-domain nature of the AI application space, the statistical nature of system metrics, the enormous energy and latency cost of data movement, the strong connection between circuit level and system level accuracy and robustness, the lack of standardized workloads and test set-ups to allow fair comparison, and many others.
This workshop brings together five experts in circuits, architectures, and systems who will discuss the challenges and potential steps towards formulating a uniform and rigorous benchmarking methodology for ML ICs and systems. It is hoped that this workshop will catalyze standardization activities in the IC design community on this important topic.

About Naresh Shanbhag

Naresh R. Shanbhag (F’06) is the Jack Kilby Professor of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. He received his Ph.D. degree from the University of Minnesota (1993) in Electrical Engineering. From 1993 to 1995, he worked at AT&T Bell Laboratories at Murray Hill where he led the design of high-speed transceiver chip-sets for very high-speed digital subscriber line (VDSL), before joining the University of Illinois at Urbana- Champaign in August 1995. His research focuses on the design of energy-efficient systems for machine learning, communications, and signal processing, spanning algorithms, architectures and integrated circuits. He has more than 200 publications in this area, holds thirteen US patents, and is a co-author of two books and multiple book chapters. Dr. Shanbhag received the 2018 SIA/SRC University Researcher Award, became an IEEE Fellow in 2006, and multiple best paper awards including the IEEE Solid-State Circuits Society’s Best Paper Award in 2006. In 2000, Dr. Shanbhag co-founded and served as the Chief Technology Officer of the Intersymbol Communications, Inc., which introduced mixed-signal ICs for electronic dispersion compensation of OC-192 optical links and was acquired by Finisar Corporation in 2007. From 2013-17, he was the founding Director of the Systems On Nanoscale Information fabriCs (SONIC) Center, a 5-year multi- university center funded by DARPA and SRC under the STARnet program.

Presentations

1. MLPerf Tiny : Benchmarking Ultra-Low-Power Machine Learning Systems: Abstract:
Advances in ultra-low-power tiny machine learning (TinyML) systems allow for a whole new category of intelligent applications at the endpoint. However, the lack of a widely accepted and easily reproducible benchmark for these systems hinders their development and adoption. To address this need, we introduce MLPerf Tiny, the industry's first benchmark suite for ultra-low-power tiny machine learning systems. Initial benchmarks include keyword detection, visual wake words, image classification, and anomaly detection. The suite meets the needs of the community and is the result of a joint effort by more than fifty organizations from industry and academia. MLPerf Tiny analyzes the accuracy, latency, and energy consumption of machine learning inference so that tradeoffs across systems can be fairly evaluated. In addition, MLPerf Tiny adopts a modular design that allows benchmark submitters to demonstrate the benefits of their solutions, regardless of where they fall in the ML deployment stack, in a consistent and repeatable manner. Despite all this effort, there is yet to be an end in sight for the "tiny" class of MLPerf benchmarks. With the rise of analog, asynchronous, event-based, probabilistic, and neuromorphic solutions, new benchmarks and metrics are needed, and current approaches require improvement. To this end, the talk also discusses the challenges and opportunities for developing new benchmarks and how they may interact with existing tasks in MLPerf Tiny.

About Vijay Janapa Reddi

Vijay Janapa Reddi is an Associate Professor at Harvard University as well as the Vice President and a Founding Member of MLCommons (mlcommons.org), a nonprofit organization devoted to accelerating machine learning (ML) innovation for all. He co-chairs the MLCommons Research organization and sits on the board of directors of MLCommons. He co-led the development of the MLPerf Inference benchmark for IoT, mobile, edge and datacenter applications. His research focuses on runtime systems, computer architecture, and machine learning principles for mobile and edge computing as well as Internet of Things. Dr. Janapa-Reddi received the Gilbreth Lecturer Honor from the National Academy of Engineering (NAE) in 2016, the IEEE TCCA Young Computer Architect Award (2016), the Intel Early Career Award (2013), the Google Faculty Research Awards in 2012, 2013, 2015, 2017, and 2020, the Best Papers at the 2020 Design Automation Conference (DAC), the 2005 International Symposium on Microarchitecture (MICRO), and the 2009 International Symposium on High Performance Computing. He has won various honors and awards, including IEEE Top Picks in Computer Architecture (2006, 2010, 2011, 2016, 2017, 2021), and was inducted into the MICRO and HPCA Halls of Fame in 2018 and 2019, respectively. Dr. Janapa-Reddi holds degrees in computer science from Harvard University, electrical and computer engineering from the University of Colorado at Boulder, and computer engineering from Santa Clara University.

2. Proper Benchmarking of In-Memory Computing Architectures: Abstract:
In-memory computing (IMC) architectures have emerged as a compelling platform to implement energy efficient machine learning (ML) systems. However, today, the energy efficiency gains provided by IMC designs seem to be leveling off and it is not clear what the limiting factors are. The conceptual complexity of IMCs combined with the absence of a rigorous benchmarking methodology makes it difficult to gauge progress and identify bottlenecks in this exciting field. We present a benchmarking methodology for IMCs comprising a compositional view of IMCs that enables one to parse an IMC design into its canonical components, a set of benchmarking metrics to quantify its performance, efficiency, and accuracy, and a strategy for analyzing the reported IMC data and metrics. The proposed benchmarking methodology on an extensive database of IMC metrics extracted from IC designs published since 2018, in order to infer and comprehend trends in this area. While the full-stack nature of IMCs provides numerous opportunities to devices researchers, analog/mixed-signal designers, architects, system, and algorithm designers, it also presents a formidable challenge in quantifying progress in the field. This talk will conclude with a series of recommendations for benchmarking of IMCs.

About Naresh Shanbhag

Naresh R. Shanbhag (F’06) is the Jack Kilby Professor of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. He received his Ph.D. degree from the University of Minnesota (1993) in Electrical Engineering. From 1993 to 1995, he worked at AT&T Bell Laboratories at Murray Hill where he led the design of high-speed transceiver chip-sets for very high-speed digital subscriber line (VDSL), before joining the University of Illinois at Urbana- Champaign in August 1995. His research focuses on the design of energy-efficient systems for machine learning, communications, and signal processing, spanning algorithms, architectures and integrated circuits. He has more than 200 publications in this area, holds thirteen US patents, and is a co-author of two books and multiple book chapters. Dr. Shanbhag received the 2018 SIA/SRC University Researcher Award, became an IEEE Fellow in 2006, and multiple best paper awards including the IEEE Solid-State Circuits Society’s Best Paper Award in 2006. In 2000, Dr. Shanbhag co-founded and served as the Chief Technology Officer of the Intersymbol Communications, Inc., which introduced mixed-signal ICs for electronic dispersion compensation of OC-192 optical links and was acquired by Finisar Corporation in 2007. From 2013-17, he was the founding Director of the Systems On Nanoscale Information fabriCs (SONIC) Center, a 5-year multi- university center funded by DARPA and SRC under the STARnet program.

3. Characterizing and Assessing In-Memory Computing Processors: Abstract:
In-memory computing (IMC) is emerging as a critical architecture for simultaneously addressing compute and data-movement bottlenecks limiting today’s AI processors. However, IMC introduces fundamental tradeoffs that are not typically of primary concern in conventional architectures. While the promise of IMC lies in advancing the key system-level metrics of ultimate concern for end-to-end executions (decisions per second per Watt, decisions per second per millimeter), on-going research at the technology, circuit, and architecture levels makes it important to develop a set of intuitions and metrics for more directed and granular assessment of IMC designs. This talk starts by exploring the fundamental device, circuit, and architectural tradeoffs and considerations for using IMC towards scalable AI computations, with the aim of investigating salient metrics and design approaches at each of these levels. It then examines a number of presented IMC designs to develop intuitions and suggest approaches to characterizing the metrics, further examining the metric levels achieved by recent and typical designs. The objective is to initiate community-wide discussion around essential metrics, intuitions, and characterization methodologies.

About Naveen Verma

Naveen Verma received the B.A.Sc. degree in Electrical and Computer Engineering from the UBC, Vancouver, Canada in 2003, and the M.S. and Ph.D. degrees in Electrical Engineering from MIT in 2005 and 2009 respectively. Since July 2009 he has been at Princeton University, where he is current Director of the Keller Center for Innovation in Engineering Education and Professor of Electrical and Computer Engineering. His research focuses on advanced sensing systems, exploring how systems for learning, inference, and action planning can be enhanced by algorithms that exploit new sensing and computing technologies. This includes research on large-area, flexible sensors, energy-efficient statistical computing architectures and circuits, and machine-learning and statistical-signal-processing algorithms. Prof. Verma has served as a Distinguished Lecturer of the IEEE Solid-State Circuits Society, and on a number of conference program committees and advisory groups. Prof. Verma is the recipient of numerous teaching and research awards, including several best-paper awards, with his students.
4. Benchmarking Novel AI Accelerators : Striving to be Both Fair and Comprehensive: Abstract:
With the rise of custom silicon chips for AI acceleration, fair and comprehensive benchmarking of hardware innovations has become increasingly important. Yet being both fair AND comprehensive is not at all easy (thus the present workshop). Interesting innovations may get introduced and demonstrated at one level (circuit- or architecture-level), but the actual resulting benefits really ought to be assessed at some much-higher level (system- or application-level), and this may not be immediately practical or feasible. Costs that are common to many accelerator approaches, such as the energy needed to load the next set of model weights into scratchpad memory, are frequently ignored for simplicity. Yet this greatly complicates the fair assessment of alternative approaches that completely avoid these costs. After an overview of benchmarking strategies at different abstraction levels, I discuss the best practices and pitfalls to-be-avoided that I’ve learned, from my time on the ISSCC/ML subcommittee and as a researcher working on nonvolatile-memory-based AI accelerators.

About Geoffrey Burr

Geoffrey W. Burr received his Ph.D. in Electrical Engineering from the California Institute of Technology in 1996. Since that time, Dr. Burr has worked at IBM Research--Almaden in San Jose, California, where he is currently a Distinguished Research Scientist. He has worked in a number of diverse areas, including holographic data storage, photon echoes, computational electromagnetics, nano photonics, computational lithography, phase-change memory, storage class memory, and novel access devices based on Mixed-Ionic-Electronic-Conduction (MIEC) materials. Dr. Burr's current research interests involve AI/ML acceleration using non-volatile memory. Geoff is an IEEE Fellow (2020), and is also a member of MRS, SPIE, OSA, Tau Beta Pi, Eta Kappa Nu, and the Institute of Physics (IOP).
5. Challenges in Designing and Evaluating Neural Processing Units: Abstract:
Machine Learning (ML) processors continue their fast-paced evolution including innovations in flexible acceleration for inferencing. To cover a wide range of applications and product domains, flexibility in supporting various kinds of neural networks, precisions and NN ops are increasingly important. On the other hand, the mobile platforms used in such situations have limited computing resources, power, and memory bandwidth. This puts even higher pressure on the inference engines to be even more efficient in terms of area and energy. The computing performance of NPU has been increasing much faster than what surrounds an NPU, such as host CPU and DRAM bandwidth. Thus, SW and overheads such as data transfer (internal and external) and host time take up a big portion of the total latency especially in small size of neural networks. NPU designs therefore increase data reuse in its memory hierarchy or reduce CPU loads rather than simply putting more compute. Various factors affect the NPU performance. Examples include thermal and power constraints as well as CPU host performance, DMA speed and the specifics of the interconnect. Thus, the evaluation methodology should be comprehensively account for networks with various characteristics (e.g., whether it covers both memory-intensive and compute-intensive networks. From the viewpoint of NPU performance, SW-HW co-optimization is very important. While the types of workloads are diversifying (e.g., transformers, graph neural networks), NPU hardware have been designed to account for the diversification (e.g. heterogeneous NPU cores). Some optimizations (e.g., mixed precision, approximate computing, compression, and pruning) may sacrifice accuracy and thus iso- accuracy constraints should be maintained for fair evaluations. Loss of accuracy will need to be discussed in the overall context of design and evaluation even if it may have little impact in the application scenario.

About Jun-Seok Park

Jun-Seok Park received the B.S., M.S., Ph.D degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2009, 2011, and 2015 respectively. He has been a hardware architect and design engineer at System LSI, Samsung Electronics, Hwaseong, South Korea since 2015. He has contributed to the development of 6 generations of Samsung's NPU as a lead HW architect. His research interests include high performance low-power NPU architecture, and memory hierarchy design for large scale ML systems.