Short Course 2 (Joint) | Symposia on VLSI Technology and Circuits

* If you encounter menus do not work upon clicking, delete your browser's cache.

Enabling a Future of Even More Powerful Computing

Moderators: Kentaro Yoshioka (Keio Univ.) and Seung H. Kang (Qualcomm)

This short course addresses future directions of technology and circuits for high-performance computers, GPU-based AI accelerator, supercomputer for deep learning, in-memory, neuromorphic, quantum, and quantum-inspired computers.

Live Q&A Session: June 14, 7:00AM-8:30AM (JST)

Acceleration of Tomorrow's Computational Challenges, Gabriel Loh, Advanced Micro Devices: Abstract:
The computing industry has already been facing a diverse set of challenges brought on by the slowing of Moore’s Law, the end of Dennard Scaling, the computational demands of our new age in artificial intelligence, and rapidly evolving and expanding application use cases. However, the demand for additional compute capabilities across mobile, edge, home, office, cloud, and supercomputer platforms will likely accelerate over the coming years. To address these computational demands and challenges through the end of the decade, we will discuss three key trends for future computer systems. The first is a complete pervasiveness of intelligence, although “intelligence” will be expanded beyond the current focus on deep machine learning. The second is viewing all workloads as opportunities for specialization and acceleration. The third is pushing modular design principles to the next level, encompassing both hardware and software, to enable the productive design, implementation, and utilization of these future compute platforms.

About Gabriel Loh

Gabe is a Senior Fellow in AMD Research. He received his Ph.D. and M.S. in computer science from Yale University in 2002 and 1999, respectively, and his B.Eng. in electrical engineering from the Cooper Union in 1998. Gabe was also a tenured associate professor in the College of Computing at the Georgia Institute of Technology, a visiting researcher at Microsoft Research, and a senior researcher at Intel Corporation. He is a Fellow of the ACM and IEEE, recipient of ACM SIGARCH's Maurice Wilkes Award, Hall of Fame member for the MICRO, ISCA, and HPCA conferences, (co-)inventor on over one hundred US patent applications and eighty granted patents, and a recipient of a U.S. National Science Foundation CAREER Award.

3D-Structured Monolithic and Heterogeneous Devices for Post-5G Applications, Yoshihiro Hayashi, Keio University: Abstract:
In upcoming post-5G systems, AI-centric infrastructures will be connected via low-latency RF and photonic networks with local servers and IoT edge devices. These digital innovations will be realized by scaled-down, ultra-low-power semiconductor devices with new functional materials in 3D configurations implemented as either nano-scaled monolithic or macroscopic heterogeneous integration. In this short course, recent 3D monolithic and heterogeneous innovations are reviewed, and their impact will be discussed on the performance and architectural leaps forward toward the computation and communication infrastructures required in post-5G systems.

About Yoshihiro Hayashi

Yoshihiro Hayashi received the B.S., M.S., and Ph.D. degrees in applied chemistry from Keio University in Yokohama, Japan, in 1981, 1983, and 1987, respectively. He was with Central Research Laboratories, NEC Corporation, Kanagawa, Japan, from 1987 to 2010 doing research on ULSI device integration and transferred to Renesas Electronics Corporation engaged in the development of RF-MCUs and healthcare solutions. Currently, he works in Strategy Planning and Collaboration Unit, TIA central office, AIST, focused on next-generation ULSI device platforms. He is also a visiting professor, Science and Technology, Keio University, to chair the system device roadmap committee of Japan (SDRJ). He was a Visiting Researcher with Rutgers University, Piscataway, NJ, from 1985 to 1986, and a Visiting Fellow with Dartmouth College, Hanover, NH, from 1992 to 1993. He is a member of IEEE Electron Devices Society and the Japan Society of Applied Physics.

Accelerated Computing: Latest Advances and Future Challenges, Ben Keller, NVIDIA: Abstract:
With the "free ride" of Moore's Law and Dennard scaling drawing to a close, today's silicon designers must aggressively pursue innovation from devices and circuits to software and systems. This presentation will highlight the full-stack innovations of the NVIDIA A100 datacenter GPU that enable a 20X leap in deep learning performance compared to its predecessor. I will then discuss ongoing efforts in NVIDIA Research to drive continuous innovation in chip design, including package-level integration, the optimization of deep learning inference accelerators, and fine-grained adaptive clocking for aggressive margin reduction.

About Ben Keller

Ben Keller received his Ph.D. in electrical engineering and computer sciences from the University of California, Berkeley in 2017. He is currently a Senior Research Scientist at NVIDIA Corporation in the ASIC & VLSI Research group. He is a recipient of the National Science Foundation Graduate Research Fellowship and the NVIDIA Graduate Fellowship, and is an author on more than 20 conference and journal publications. Ben's research interests include digital clocking and synchronization techniques, fine-grained adaptive voltage scaling, and energy-efficient digital design.

Next-Generation Deep-Learning Accelerators: From Hardware to System, Yakun Sophia Shao, University of California, Berkeley: Abstract:
Machine learning is poised to substantially change society in the next 100 years, just as how electricity transformed the way industries functioned in the past century. In particular, deep learning has been adopted across a wide variety of industries, from computer vision, natural language processing, autonomous driving, to robotic manipulation. Motivated by the high computational requirement of deep learning, there has been a large number of novel deep-learning accelerators proposed in academia and industry to meet the performance and efficiency demands of deep-learning applications.
To this end, this short course will cover various aspects of deep-learning hardware from a system perspective, including deep-learning basics, hardware & software optimizations for deep-learning, system integration, and compiler optimization. In particular, we will discuss challenges and opportunities for next-generation of deep-learning accelerators, with a special focus on system-level implications of designing, integrating, and scheduling of future deep-learning accelerators.

About Yakun Sophia Shao

Professor Yakun Sophia Shao is an Assistant Professor and an SK Hynix Faculty Fellow of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Previously, she was a Senior Research Scientist at NVIDIA and received her Ph.D. degree in 2016 from Harvard University. Her research interests are in the area of computer architecture, with a special focus on domain-specific architecture, deep-learning accelerators, and high-productivity hardware design methodology. Her work has been awarded a Best Paper Award (MICRO 2019), a Research Highlight of Communications of ACM (2021), Top Picks in Computer Architecture (2014), and Honorable Mentions (2019*2). Her Ph.D. dissertation was nominated by Harvard for the ACM Doctoral Dissertation Award. She is a Siebel Scholar, an invited participant at the Rising Stars in EECS Workshop, and a recipient of the IBM Ph.D. Fellowship. Her personal webpage is https://people.eecs.berkeley.edu/~ysshao/.

Hardware for Next Generation AI, Dmitri Nikonov and Amir Khosrowshahi, Intel Labs: Abstract:
The field of machine learning continues to evolve at a rapid pace.　The past few years have seen remarkable advancements across many areas　including computer vision, natural language processing, and reinforcement learning.　Progress is driven by the availability of scalable compute, larger corpora of data,　and novel algorithmic approaches. We survey various neural accelerator chips which enabled this progress (Google TPU, Cerebras, Nervana, etc.) and outline the envelope of their performance. It is determined by the underlying hardware - digital CMOS multipliers. This limits their energy efficiency (TOPS/Watt) and, coupled with the exponentially growing demand for AI computing in the world, leads to an unsustainable consumption of energy. Promising directions for transformative change to address these challenges are (1) beyond CMOS compute, memory, and interconnects and (2) analog neural network architectures. We then review recent research on the use of various beyond CMOS devices (resistive RAM, phase change memory, floating gate, ferroelectric, spintronic, and photonic devices) for neural networks. Also, we touch upon examples of digital and analog arrays for compute-in-memory. We benchmark their experimentally achieved operating time and energy vs. theoretical projections. We project the options likely to achieve significant improvement in energy efficiency.

About Dmitri Nikonov

Dmitri E. Nikonov received M.S. in aeromechanical engineering from the Moscow Institute of physics and technology in 1992 and Ph.D. in physics from Texas A&M University in 1996, where he participated in the demonstration of the world’s first laser without population inversion. He joined Intel Corporation in 1998 and is presently a Principal Engineer in the Components Research group in Hillsboro, Oregon. He is responsible for simulation, benchmarking, and design of beyond-CMOS logic devices and neural network circuits, and for managing joint research programs with universities on nanotechnology and exploratory devices. From 1997 to 1998, he was a research engineer and lecturer at the Department of Electrical and Computer Engineering of University of California Santa Barbara. In 2006, he was appointed Adjunct Associate Professor of Electrical and Computer Engineering at Purdue University. He has 115 publications in refereed journals in quantum optics, lasers, nanoelectronics, spintronics and neural networks, and 85 issued patents in integrated optic, electronic, and spintronic devices.

About Amir Khosrowshahi

Amir Khosrowshahi has an A.B. in physics and math, AM in physics from Harvard and a Ph.D. in computational neuroscience from University of California, Berkeley. He is currently VP and CTO of AI platforms at Intel. His group develops machine learning processors as well as software, systems, and machine learning algorithms needed by Intel’s customers to run their machine learning workloads at scale. He is also a visiting researcher at the Helen Wills Neuroscience Institute at Berkeley investigating the intersection of machine learning and device physics. Amir came to Intel through the acquisition of his company Nervana in 2016. Nervana provided machine learning as a service, and its core technology was a novel distributed architecture for deep learning which was incorporated into new Intel processors. He has worked at a number of successful early-stage startups including Zappos (acquired by Amazon), Tellme Networks (Microsoft), and Xl2web (now Google Sheets). Amir started his career in finance and was a VP at Goldman Sachs in derivatives trading.

Re-Engineering Computing with Neuro-Inspired Learning: Algorithms, Hardware Architecture, and Devices, Kaushik Roy, Purdue University: Abstract:
Advances in machine learning, notably deep learning, have led to computers matching or surpassing human performance in several cognitive tasks including vision, speech and natural language processing. However, implementation of such neural algorithms in conventional "von-Neumann" architectures are several orders of magnitude more area and power expensive than the biological brain. Hence, we need fundamentally new approaches to sustain exponential growth in performance at high energy-efficiency beyond the end of the CMOS roadmap in the era of ‘data deluge’ and emergent data-centric applications. Exploring the new paradigm of computing necessitates a multi-disciplinary approach: exploration of new learning algorithms inspired from neuroscientific principles, developing network architectures best suited for such algorithms, new hardware techniques to achieve orders of improvement in energy consumption, and nanoscale devices that can closely mimic the neuronal and synaptic operations of the brain leading to a better match between the hardware substrate and the model of computation. In this short course presentation, I will focus on recent work on neuromorphic computing with spike-based learning and the design of underlying hardware that can lead to quantum improvements in energy efficiency with good accuracy.

About Kaushik Roy

Kaushik Roy received B.Tech. degree in from the Indian Institute of Technology, Kharagpur, India, and Ph.D. degree from the University of Illinois at Urbana-Champaign in 1990. He was with the Semiconductor Process and Design Center of Texas Instruments, Dallas, for three years. He joined the electrical and computer engineering faculty at Purdue University, West Lafayette, IN, in 1993, where he is currently the Edward G. Tiedemann Jr. Distinguished Professor. He also the director of the center for brain-inspired computing (C-BRIC) funded by SRC/DARPA. His research interests include neuromorphic and emerging computing models, neuro-mimetic devices, spintronics, device-circuit-algorithm co-design for nano-scale Silicon and non-Silicon technologies, and low-power electronics. Dr. Roy has published more than 800 papers in refereed journals and conferences, holds 28 patents, supervised 91 Ph.D. dissertations, and is co-author of two books on Low Power CMOS VLSI Design (John Wiley & McGraw Hill).

Quantum Computing with Superconducting Circuits, Markus Brink, IBM: Abstract:
Quantum computing has made tremendous progress in recent years, which led to wider interest in the field. Superconducting quantum circuits have emerged as a prime contender for implementing quantum processors, with the goal of realizing universal quantum computing. Quantum processors have scaled significantly in size, as measured by the number of quantum bits (qubits) connected on a chip, with devices incorporating more than 50 qubits available today. Likewise, the quality of qubits and quantum processors has also increased steadily, as measured, for example, by the quantum volume. Despite these advances, fault-tolerant quantum computing is still some time away, due to the significant hardware overhead and performance requirements for error-correction codes. But early quantum applications and demonstrations can already be implemented on near-term quantum systems.

About Markus Brink

Markus Brink received his Ph.D. in physics from Cornell University in 2007, where he investigated low-dimensional systems and nanostructures using cryogenic scanning force microscopy combined with electron transport measurements under Prof. Paul L. McEuen. He subsequently joined Prof. Michel H. Devoret's Qlab at Yale University, working as postdoctoral researcher on qubits and quantum state readout in superconducting circuits. Since 2010, he has been a Research Staff Member at the IBM Thomas J. Watson Research Center in Yorktown Heights, New York, where he currently leads the quantum processor development team.

Digital Annealing Technology for Solving Combinatorial Optimization Problems, Koichi Kanda, Fujitsu Laboratories Ltd.: Abstract:
Demand for continued computer performance growth even after the end of Moore’s Law has led to various domain‐specific hardware approaches. Fujitsu’s Digital Annealer Unit (DAU), whose concept was first published in 2016, is an ASIC for solving large‐scale combinatorial optimization problems, where the objective function to minimize is formulated as an Ising model. The DAU achieves full connectivity among 8k variables and performs Markov Chain Monte Carlo searches in a multidimensional binary space. In this presentation, the DA algorithm and architecture are explained with an emphasis on techniques to accelerate the process of finding the optimal solution inside the hardware, such as parallel tempering. Techniques for applying the digital annealing concept to variable spaces representing permutations and assignments will be also presented. Applications of DAU and effectiveness of those techniques will be shown along with benchmark results of various problems.

About Kouichi Kanda

Kouichi Kanda was born in Tokyo in 1975. He received the B.S., M.S., and Ph.D. degrees, all in electronic engineering, from The University of Tokyo Japan in 1998, 2000, and 2003, respectively. In 2003, he joined Fujitsu Laboratories Ltd, where he was engaged in research and development of high‐speed CMOS ICs for wireline and wireless communications. From 2012, he worked on IC and system design for IoT such as a medical BAN transceiver (in collaboration with imec Holst Centre) and a cloud‐assisted low‐power position sensing system. Since 2018, he is with the Digital Annealer project, doing research on hardware architectures as well as software algorithms for speeding up the process of solving combinatorial optimization problems.