Workshop 5

* If you encounter menus do not work upon clicking, delete your browser's cache.

Uniform and Rigorous Benchmarking of Machine Learning ICs and Systems

Organizer : Naresh Shanbhag (University of Illinois at Urbana-Champaign)

Artificial Intelligence is transforming our society as we speak in diverse ways. Integrated circuits (ICs) and systems, both hardware and software, form the core of this transformation. As societal scale systems such as health care, transportation, environmental/climate monitoring, energy, education, and others become AI-enabled, it is critical that machine learning (ML) ICs and systems delivered for large-scale deployment by our community are not just energy efficient and high throughput but also accurate, robust, secure, and predictable. Proper and rigorous benchmarking methodologies for ML ICs is a compelling and current need to enable our community to measure and gauge progress in this exciting area and deliver ML systems with guarantees on its behavior, efficiency, and performance. Benchmarking of ML ICs and systems is made challenging by the unique data-centric cross-domain nature of the AI application space, the statistical nature of system metrics, the enormous energy and latency cost of data movement, the strong connection between circuit level and system level accuracy and robustness, the lack of standardized workloads and test set-ups to allow fair comparison, and many others.
This workshop brings together five experts in circuits, architectures, and systems who will discuss the challenges and potential steps towards formulating a uniform and rigorous benchmarking methodology for ML ICs and systems. It is hoped that this workshop will catalyze standardization activities in the IC design community on this important topic.

About Naresh Shanbhag

Naresh R. Shanbhag (F’06) is the Jack Kilby Professor of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. He received his Ph.D. degree from the University of Minnesota (1993) in Electrical Engineering. From 1993 to 1995, he worked at AT&T Bell Laboratories at Murray Hill where he led the design of high-speed transceiver chip-sets for very high-speed digital subscriber line (VDSL), before joining the University of Illinois at Urbana- Champaign in August 1995. His research focuses on the design of energy-efficient systems for machine learning, communications, and signal processing, spanning algorithms, architectures and integrated circuits. He has more than 200 publications in this area, holds thirteen US patents, and is a co-author of two books and multiple book chapters. Dr. Shanbhag received the 2018 SIA/SRC University Researcher Award, became an IEEE Fellow in 2006, and multiple best paper awards including the IEEE Solid-State Circuits Society’s Best Paper Award in 2006. In 2000, Dr. Shanbhag co-founded and served as the Chief Technology Officer of the Intersymbol Communications, Inc., which introduced mixed-signal ICs for electronic dispersion compensation of OC-192 optical links and was acquired by Finisar Corporation in 2007. From 2013-17, he was the founding Director of the Systems On Nanoscale Information fabriCs (SONIC) Center, a 5-year multi- university center funded by DARPA and SRC under the STARnet program.

Presentations

1. MLPerf Tiny : Benchmarking Ultra-Low-Power Machine Learning Systems

Abstract:
Advances in ultra-low-power tiny machine learning (TinyML) systems allow for a whole new category of intelligent applications at the endpoint. However, the lack of a widely accepted and easily reproducible benchmark for these systems hinders their development and adoption. To address this need, we introduce MLPerf Tiny, the industry's first benchmark suite for ultra-low-power tiny machine learning systems. Initial benchmarks include keyword detection, visual wake words, image classification, and anomaly detection. The suite meets the needs of the community and is the result of a joint effort by more than fifty organizations from industry and academia. MLPerf Tiny analyzes the accuracy, latency, and energy consumption of machine learning inference so that tradeoffs across systems can be fairly evaluated. In addition, MLPerf Tiny adopts a modular design that allows benchmark submitters to demonstrate the benefits of their solutions, regardless of where they fall in the ML deployment stack, in a consistent and repeatable manner. Despite all this effort, there is yet to be an end in sight for the "tiny" class of MLPerf benchmarks. With the rise of analog, asynchronous, event-based, probabilistic, and neuromorphic solutions, new benchmarks and metrics are needed, and current approaches require improvement. To this end, the talk also discusses the challenges and opportunities for developing new benchmarks and how they may interact with existing tasks in MLPerf Tiny.

2. Proper Benchmarking of In-Memory Computing Architectures

Abstract:
In-memory computing (IMC) architectures have emerged as a compelling platform to implement energy efficient machine learning (ML) systems. However, today, the energy efficiency gains provided by IMC designs seem to be leveling off and it is not clear what the limiting factors are. The conceptual complexity of IMCs combined with the absence of a rigorous benchmarking methodology makes it difficult to gauge progress and identify bottlenecks in this exciting field. We present a benchmarking methodology for IMCs comprising a compositional view of IMCs that enables one to parse an IMC design into its canonical components, a set of benchmarking metrics to quantify its performance, efficiency, and accuracy, and a strategy for analyzing the reported IMC data and metrics. The proposed benchmarking methodology on an extensive database of IMC metrics extracted from IC designs published since 2018, in order to infer and comprehend trends in this area. While the full-stack nature of IMCs provides numerous opportunities to devices researchers, analog/mixed-signal designers, architects, system, and algorithm designers, it also presents a formidable challenge in quantifying progress in the field. This talk will conclude with a series of recommendations for benchmarking of IMCs.

 

3. Characterizing and Assessing In-Memory Computing Processors

Abstract:
In-memory computing (IMC) is emerging as a critical architecture for simultaneously addressing compute and data-movement bottlenecks limiting today’s AI processors. However, IMC introduces fundamental tradeoffs that are not typically of primary concern in conventional architectures. While the promise of IMC lies in advancing the key system-level metrics of ultimate concern for end-to-end executions (decisions per second per Watt, decisions per second per millimeter), on-going research at the technology, circuit, and architecture levels makes it important to develop a set of intuitions and metrics for more directed and granular assessment of IMC designs. This talk starts by exploring the fundamental device, circuit, and architectural tradeoffs and considerations for using IMC towards scalable AI computations, with the aim of investigating salient metrics and design approaches at each of these levels. It then examines a number of presented IMC designs to develop intuitions and suggest approaches to characterizing the metrics, further examining the metric levels achieved by recent and typical designs. The objective is to initiate community-wide discussion around essential metrics, intuitions, and characterization methodologies.

4. Benchmarking Novel AI Accelerators : Striving to be Both Fair and Comprehensive

Abstract:
With the rise of custom silicon chips for AI acceleration, fair and comprehensive benchmarking of hardware innovations has become increasingly important. Yet being both fair AND comprehensive is not at all easy (thus the present workshop). Interesting innovations may get introduced and demonstrated at one level (circuit- or architecture-level), but the actual resulting benefits really ought to be assessed at some much-higher level (system- or application-level), and this may not be immediately practical or feasible. Costs that are common to many accelerator approaches, such as the energy needed to load the next set of model weights into scratchpad memory, are frequently ignored for simplicity. Yet this greatly complicates the fair assessment of alternative approaches that completely avoid these costs. After an overview of benchmarking strategies at different abstraction levels, I discuss the best practices and pitfalls to-be-avoided that I’ve learned, from my time on the ISSCC/ML subcommittee and as a researcher working on nonvolatile-memory-based AI accelerators.

5. Challenges in Designing and Evaluating Neural Processing Units

Abstract:
Machine Learning (ML) processors continue their fast-paced evolution including innovations in flexible acceleration for inferencing. To cover a wide range of applications and product domains, flexibility in supporting various kinds of neural networks, precisions and NN ops are increasingly important. On the other hand, the mobile platforms used in such situations have limited computing resources, power, and memory bandwidth. This puts even higher pressure on the inference engines to be even more efficient in terms of area and energy. The computing performance of NPU has been increasing much faster than what surrounds an NPU, such as host CPU and DRAM bandwidth. Thus, SW and overheads such as data transfer (internal and external) and host time take up a big portion of the total latency especially in small size of neural networks. NPU designs therefore increase data reuse in its memory hierarchy or reduce CPU loads rather than simply putting more compute. Various factors affect the NPU performance. Examples include thermal and power constraints as well as CPU host performance, DMA speed and the specifics of the interconnect. Thus, the evaluation methodology should be comprehensively account for networks with various characteristics (e.g., whether it covers both memory-intensive and compute-intensive networks. From the viewpoint of NPU performance, SW-HW co-optimization is very important. While the types of workloads are diversifying (e.g., transformers, graph neural networks), NPU hardware have been designed to account for the diversification (e.g. heterogeneous NPU cores). Some optimizations (e.g., mixed precision, approximate computing, compression, and pruning) may sacrifice accuracy and thus iso- accuracy constraints should be maintained for fair evaluations. Loss of accuracy will need to be discussed in the overall context of design and evaluation even if it may have little impact in the application scenario.