

## Technical Highlights from the 2022 IEEE Symposium on VLSI Technology and Circuits

The 2022 IEEE Symposium on VLSI Technology and Circuits is a premiere international conference that records the pace, progress, and evolution of micro/nano integrated electronics, scheduled from June 12-17, 2022. The joint Symposium will be held in person in Hawaii with a hybrid format including both live sessions on-site in the Hilton Hawaiian Village to enable networking opportunities, and on-demand sessions to allow access to selected talks and panels to those who cannot travel.

The Symposium coincides this year with the 75<sup>th</sup> anniversary of the invention of the solid-state transistor.

The Symposium's overall theme, **"Technology and Circuits for the Critical Infrastructure of the Future,"** integrates advanced technology developments, innovative circuit design, and the applications that they enable as part of our global society's transition to a new era of smart connected devices, infrastructure and systems that change the way humans interact with each other.

The following are some of the highlighted papers that address this theme:

## Joint Technology and Circuits Highlights

This year the previously two separate symposia on technology and circuits have merged into one symposium. Here are some highlighted papers which represent a combined step forward in technology and circuits:

#### **Beyond CMOS Machine Learning**

"Experimental Demonstration of Novel Scheme of HZO/Si FeFET Reservoir Computing with Parallel Data Processing for Speech Recognition" – University of Tokyo (Paper C25-1)

Researchers at the University of Tokyo present a novel implementation of reservoir computing using ferroelectric gate MOSFETs (FeFETs) in a parallel data processor for speech recognition. Reservoir computing is a machine learning technique that offers efficient online learning for edge AI applications as it contains only one layer of readout weights to be trained. They demonstrate that polarization dynamics of FeFETs can perform compute in memory operations. Fundamental machine learning tasks such as short-term memory (STM) and parity-check (PC) tasks have been successfully performed by using the virtual nodes

extracted from the time response of drain current of a single FeFET. In the reported speech recognition experiment they achieve higher than 95.9% accuracy.



**Figures 1 & 18.** University of Tokyo presents a novel implementation of reservoir computing using ferroelectric gate MOSFETs (FeFETs) in a parallel data processor for speech recognition (left) showing similar accuracy to software-based reservoir computing (right).

#### **Quantum Computing**

"Scalable 1.4µW Cryo-CMOS SP4T Multiplexer Operating at 10mK for High-Fidelity Superconducting Qubit Measurements" – KU Leuven (Paper JCS1-2)

Researchers at KU Leuven report on the electrical performance of an ultra-low power cryo-CMOS single-pole-4-throw (SP4T) RF multiplexer working at 10mK base temperature. They use the multiplexer to benchmark a superconducting qubit for the very first time and obtain qubit coherence times higher than 35µs along with an average single-qubit gate fidelity of 99.93%, which exceeds the threshold required for quantum error-correction based on surface code. This work demonstrates the operability of superconducting qubits with ultralow-power cryo-CMOS devices at the base temperature, paving the way for advanced cointegration schemes.



*Figures 2 & 4.* Quantum device consisting of a high coherence superconducting qubit coupled to a superconducting resonator (left-most panel) along with the internal architecture of a custom-designed low-power RF SP4T cryo-multiplexer (right panel).

#### **Compute-in-Memory**

"An 8-bit 20.7 TOPS/W Multi-Level Cell ReRAM-based Compute Engine" – University of Michigan (Paper JFS4-1)

Researchers at University of Michigan, in collaboration with Applied Materials, report that analog compute in memory with multi-level cell resistive RAM (ReRAM) promises highly dense and efficient compute for machine learning and scientific computing. The authors present an SoC prototype comprised of four self-contained ReRAM based compute-in-memory tiles and a RISC-V host. The measured raw and normalized peak efficiencies are 20.7 and 662 TOPS/W, respectively, and the reported compute density is 8.4 TOPS/mm<sup>2</sup> and the classification accuracy is 96.8% using the 128 MNIST dataset.



**Figure 9.** Analog compute in memory with Multi-Level Cell (MLC) ReRAM promises highly dense and efficient compute support for machine learning and scientific computing. Presented is an SoC prototype comprised of four self-contained ReRAM- based CIM tiles and a RISC-V host. The measured raw and normalized peak efficiencies are 20.7 and 662 TOPS/W, respectively. The compute density is 8.4 TOPS/mm2.

#### **Compute-in-Memory**

"A 40nm Analog-Input ADC-Free Compute-in-Memory RRAM Macro with Pulse-Width Modulation Between Sub-arrays" – Georgia Institute of Technology (Paper JFS4-2)

Compute-in-Memory (CIM) has emerged as an attractive alternative to traditional digital implementations for extensive multiply-and-accumulate (MAC) operations, which is suitable deep neural networks (DNN) applications. Georgia Institute of Technology presents an ADC-free compute-in-memory (CIM) RRAM-based macro composed of 1T1R bit cells. Most CIM macros employ ADCs which impose performance limitations and accuracy loss due to quantization and noise. This work proposes an ADC-free memory scheme using analog signal processing with direct digitization achieving a 0.5X reduction in area overhead of sensing circuitry and a 6.9X throughput improvement. The proposed scheme also achieves 11.6X improvement on energy efficiency and 4.3X improvement on compute efficiency.



*Figures 3 & 8.* (Left) Structure and operation of the proposed ADC-free compute scheme using 1T1R array. (Right) Die photo of two-array RRAM macro with labels indicating the sub-blocks.

# **Technology Highlights**

## Advanced CMOS Technology

*"Intel 4 CMOS Technology Featuring Advanced FinFET Transistors Optimized for High Density and High-Performance Computing" – Intel (Paper T1-1)* 

Moore's Law continues apace: Intel introduces a new advanced CMOS FinFET technology, Intel 4, that extends Moore's law by offering 2X area scaling of the high-performance logic library and greater than 20% performance gain at iso-power over Intel 7. The scaled high-performance library offers 50nm gate pitch, 30nm fin pitch and 30nm minimum metal pitch.

This node delivers 8VT (4NVT + 4PVT) spanning a range of 190mV/180mV for N/PMOS, enabling designers to choose between power and speed requirements. EUV lithography is used extensively to simplify the process flow and improve yield. The interconnect stack features 16 metal layers with enhanced copper metallurgy at critical lower layers to deliver improved electromigration and lower line resistance.



**Figures 6 & 9.** (Left) Cross-section view of the Intel 4 interconnect stack. EUV patterning is used in lower metal layers to simplify the process flow and improve yield. (Right) Normalized electromigration lifetime vs. normalized line resistance for different metallurgies on Intel 4 vs. Intel 7 technologies.

### Advanced CMOS Technology

"Scaled FinFETs Connected by Using Both Wafer Sides for Routing via Buried Power Rails" – imec (Paper T1-2)

In recent years, imec has developed its buried power rail (BPR) technology, pushing the power rails under transistors which has the dual benefits of lower IR drop and increased routing density as signal routes and power routes no longer have routing conflicts. Here imec reports on scaled finFETs with a novel routing scheme enabling power connections via BPR from both wafer sides. On the frontside after vias patterning contacting to p/n S/D-epi and BPR is performed in a single metallization step with an optimized preclean while preserving a good contact interface. After wafer flipping, bonding and extreme thinning, highly scaled 323nm deep nano-through-Si-vias (nTSV) land on BPR, with tight overlay control and unchanged BPR resistance. By moving the power delivery network to the backside, it provides less dynamic and static IR drop predicted from on-chip power heat maps generated for a low power 64-bit CPU at 2nm design rules. P/NMOS show similar or even superior ION-IOFF after backside processing and extra anneal(s) are added for VT recovery, mobility and BTI improvement.



**Figure 2c.** TEM image illustrating FinFETs built with a novel routing scheme, wherein both sides of the wafer are used for device connection via buried power rails (BPR). FinFETs up to M1 level are built on the wafer's frontside (FS), after which the wafer is flipped over, bonded to a carrier wafer and thinned down. On the wafer's FS, M1 lines (FSM1) are connected through V0 vias to M0A lines which are then connected to BPR lines by VBPR vias.

#### **MRAM Memory Technology**

*"Reliable Sub-nanosecond MRAM with Double Spin-Torque Magnetic Tunnel Junctions" – IBM (Paper T1-4)* 

Spin-torque transfer magnetoresistive random-access memory (STT-MRAM) technology has shown energy improvements over flash or SRAM and it is now in mass-production. However, both the reliability and the speed of the STT-MRAM bit cell device remain factors to improve. In this paper, IBM demonstrates both improvements with sub-nanosecond switching in two terminal STT-MRAM devices by using Double Spin-torque Magnetic Tunnel Junctions (DS-MTJs). Low errors in write operations are achieved with ≤250 ps write pulses and tight distributions over a temperature range of -40°C to 85°C. To establish reliability, no degradation is observed after 1E10 write cycles. Comparing this two-terminal DS-MTJ STT-MRAM device to recently published three terminal Spin-Orbit-Torque (SOT) MRAM devices shows a 10X reduction in switching current density and 3-10X reduction of in-class power consumption.



**Figure 6.** Write error rates for a single device, plotted using a normal quantile scale (using the absolute value of the standard deviation from the fifty-percent switching current density), measured at -40°C, 25°C, and 85°C, for pulse widths from 225ps to 10ns.

#### **DRAM Memory Technology**

"Vertical Channel-All-Around (CAA) IGZO FET less than 50nm CD with High Read Current of 32.8μA/μm (Vth +1V), Well-Performed Thermal Stability up to 120°C for Low Latency, High-Density 2T0C 3D DRAM Application" – Institute of Microelectronics of the Chinese Academy of Sciences & Huawei (Paper T2-3)

For the first time, the Institute of Microelectronics of the Chinese Academy of Sciences & Huawei report on development of high-performance DRAM. They demonstrate a vertical channel-all-around (CAA) IGZO FET, scaled down to an active footprint of less than  $50 \times 50 \text{ nm}^2$ . With optimized IGZO thickness (~3nm) and high-K dielectric (HfOx), high current density of  $32.8 \mu A/\mu m$  at Vth +1V with subthreshold swing of 92 mV/dec is achieved in the IGZO CAA FET with channel length of 55nm and critical dimension (CD) of 50nm. Good thermal stability and reliability is also demonstrated by temperature variation tests and positive-bias-temperature-stress (PBTS) from -40°C to  $120^{\circ}$ C. Their results show that CAA IGZO FET is a promising candidate for the high-density, high-performance 3D DRAM beyond 1 $\alpha$  nodes in the future.



*Figure 5.* The Institute of Microelectronics of the Chinese Academy of Sciences & Huawei's cross-section of IGZO-CAA (vertical Channel-All-Around) FET with CD of about 50nm by TEM. 8nm HfOx dielectric combined with 55nm channel length, approximatively.

#### **PCM Memory Technology**

"First Demonstration of Ge<sub>2</sub>Sb<sub>2</sub>Te₅-Based Superlattice Phase Change Memory with Low Reset Current Density (~3 MA/cm<sup>2</sup>) and Low Resistance Drift (~0.002 at 105°C)" – Stanford University (Paper T4-1)

Phase Change Memory (PCM) offers programmable and non-volatile memory for a wide range of applications requiring high-density storage. Stanford presents advances in PCM memory structures in which they investigate superlattice PCM (SL-PCM) heterostructures that reduce the reset current density ( $J_{reset}$ ) and resistance drift coefficient (v). However, SLs have not been studied with the well-known phase change material Ge<sub>2</sub>Sb<sub>2</sub>Te<sub>5</sub> (GST), and the effect of SL interfaces and intermixing layers also remains unknown. Here, using SLs based on GST/Sb<sub>2</sub>Te<sub>3</sub> for the first time, they simultaneously achieve  $J_{reset} \approx 3-4$  MA/cm<sup>2</sup> and 7 resistance states (v  $\approx 0.002$ ) in mushroom-cell PCM with bottom electrode diameter down to 110nm. The low  $J_{reset}$  and v are retained even after 106 cycles and at high temperature (105°C), respectively. They also uncover that both  $J_{reset}$  and v in SL-PCM decrease with increasing number of SL interfaces.



**Figure 3 + inset.** Cross-sectional scanning electron microscope image of a mushroom-cell GST/Sb<sub>2</sub>Te<sub>3</sub> superlattice phase-change memory device with the inset showing atomically sharp superlattice interfaces and van der Waals-like gaps.

#### Image Sensor Technology

"A 0.6µm Small Pixel for High Resolution CMOS Image Sensor with Full Well Capacity of 10,000e- by Dual Vertical Transfer Gate Technology" – Samsung (Paper T8-4)

The CMOS pixel race continues: Samsung have developed a prototype 200Mp image sensor using 0.6µm pixels with full well capacity (FWC) of 10,000e- using dual vertical transfer gate (D-VTG) technology. The FWC of D-VTG increases by 60% compared to single vertical transfer gate (S-VTG) and improves the transfer capability by increasing the controllability of TG voltage. They also optimize photo-electron transfer by the gap, depth, and taper slope of VTGs.



**Figures 1 & 3a.** (Left) Samsung's full well capacity according to pixel size. A 10ke- full well is achieved thanks to 0.6um pitch pixel. (Right) Potential profile in case of single and dual vertical transfer gate.

#### Wireline & Optical Systems

*"Low-capacitance Ultra-thin InGaAs Membrane Photodetector on Si Slot Waveguide Towards Receiver-less System" – University of Tokyo (Paper T15-4)* 

The University of Tokyo proposes a Si/III-V hybrid waveguide photodetector consisting of an ultrathin InGaAs membrane and Si slot waveguide, enabling low capacitance and high responsivity, simultaneously offering speed improvements in high-speed data center & backbone links. The strong optical confinement in a Si slot waveguide enhanced optical absorption in the InGaAs membrane. As a result, they successfully demonstrate high responsivity of 1A/W and sufficiently small capacitance of 1.9fF to realize a receiver-less (TIA-less) system.



**Figure 3abc.** Schematics of (a) a conventional InGaAs waveguide photodetector with taper and (b) the proposed ultrathin InGaAs membrane waveguide photodetector. Smoother mode transition and small reflection of the ultrathin InGaAs membrane waveguide photodetector can eliminate an InP taper, enabling a simpler fabrication process. (c) Mode propagation from Si waveguide to photodetector. By using an ultrathin InGaAs membrane, Smoothe mode transition can be achieved without taper.

#### Wireline & Optical Systems

*"First Monolithic Integration of Group IV Waveguide Photodetectors and Modulators on 300mm Si Substrates for 2µm Wavelength Optoelectronic Integrated Circuit" – University of Singapore (Paper T15-2)* 

The 2µm optical spectrum window in one path to address the communication capacity crunch as developments in single-mode fibers at 1310 and 1550nm are approaching theoretical limits. University of Singapore reports the first monolithic integration of group IV waveguide photodetectors (PDs) and modulators on 300mm Si substrates for 2µm wavelength applications that have Si CMOS compatibility with a route to volume manufacturing. Their waveguide PDs and electro-absorption modulators (EAMs) employ Ge0.92Sn0.08/Ge multiple-quantum-well (MQW) as the active layer. They make use of the Ge buffer layer as the Ge-on-Si waveguide and grating coupler so that the light can be coupled to the EAM and PD for direct modulation and detection, respectively. The extended coupling path in their waveguide PD enhances the optical responsivity by 35 times over the surface illuminated mode with the same absorption layer, leading to the highest responsivity of 525mA/W among all GeSn-based 2µm PDs with a high 3dB bandwidth of 6GHz. In addition, for the first time, they demonstrate the feasibility of a 2µm fully integrated transceiver with the successful operation of both PDs and EAMs on the same Si substrate.



**Figures 2 & 3b.** (Left) The 3D schematic image of the monolithic integrated waveguide PD and EAM for 2µm wavelength on Si substrate. The light is coupled to the waveguide through a coupler and transmitted towards the EAM for modulation, and then detected by the PD. (Right) The cross-sectional TEM image of the GeSn MQW layer stack on 300-mm Si substrate for 2µm integrated platform.

#### **Advanced Materials**

"Wafer-Scale Bi-Assisted Semi-Auto Dry Transfer and Fabrication of High-Performance Monolayer CVD WS<sub>2</sub> Transistor" – TSMC (Paper T1-5).

TSMC reports a transistor technology employing a novel wafer-scale semi-automated dry transfer process for monolayer (1L) CVD WS<sub>2</sub> developed utilizing the weakly coupled interface between a semimetal Bi and a two-dimensional semiconductor WS<sub>2</sub>. Monolayer 2D semiconductors have shown promising potential as the ultimately scaled channel materials for future transistor technologies because of the well-preserved carrier mobility at the atomic-scale channel thickness and the better electrostatic control at a shorter channel length (LCH < 10nm) than bulk semiconductors. This new monolayer transfer method is demonstrated at wafer scale. The n-FETs in this process achieve high on-current of up to  $250\mu$ A/µm at V<sub>DS</sub> = 1V at <0.73kΩ·µm contact resistance and gate length of 135nm.



**Figure 6.** Photograph (a) of the 2" 1L CVD WS2 transferred onto a 4" SiNx (100nm)/p++-Si wafer with the global back gated device structure. The SEM (b) and cross section TEM image (c) of the device structure with Sb/Au contact.

#### Image Sensor Technology

"A 2-Layer Transistor Pixel Stacked CMOS Image Sensor with Oxide-Based Full Trench Isolation for Large Full Well Capacity and High Quantum Efficiency" – Sony (Paper T1-3)

Sony demonstrates the development of a 2-Layer transistor pixel stacked CMOS image sensor (CIS) that possesses a large full well capacity (FWC) and high quantum efficiency (QE). Photodiodes (PDs) and pixel transistors are fabricated on different Si layers by a three-dimensional sequential integration process to increase the PD volumes, and new sublocal connections that connect multiple floating diffusions are introduced to improve the conversion gain and random noise. Silicon oxide is used as the embedded material for full trench isolations (FTIs) for the first time instead of conventional poly-Si to prevent light from being absorbed by the FTIs, and the QE at a wavelength of 530nm increases by 19%. They demonstrate a 1.0µm dual PD CIS with an FWC of 12,000e-, previously achieved only by larger pixel sizes.



*Figures 3ab & 9.* Device structures of 2-layer pixel (a) without and (b) with sub-local connections. Graph (c) shows relationships between the pixel size and FWC in this work compared with those in previous studies.

## **Circuits Highlights**

#### **Machine Learning**

"A 17-95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4bit Quantization for Transformers in 5nm" – NVIDIA (Paper C2-1)

Dedicated machine learning processors for natural language processing or machine vision are becoming a workhorse of both edge devices and in the data center. Researchers at NVIDIA present their latest prototype deep learning accelerator in 5nm CMOS. There is a fundamental link in accelerators between high accuracy calculations and high power consumption. On the other hand, low accuracy calculations can often result in incorrect classification and user dissatisfaction. The authors propose a new method to address this challenge, by employing data dependent vector scaling to perform arithmetic tasks at 4b with only 0.7% accuracy loss resulting in 95.6 TOPS/W power efficiency.



*Figure 5.* Block diagram of NVIDIA's proposed deep learning inference accelerator with pervector scaled quantization support.

#### **DRAM Memory**

"A 16GB 1024GB/s HBM3 DRAM with On-Die Error Control Scheme for Enhanced RAS Features" - Samsung (paper C15-1)

Samsung present their third generation of 10nm DRAM with improved performance targeting greater system reliability, availability, and serviceability (RAS) for usage in automotive, industrial, and data center applications. They achieve this goal in their "High Bandwidth Memory-3" (HBM3) DRAM by focusing on improved error correction utilizing a new scheme for on-die error code correction (ECC) which can correct for a 16bit word error and 2 single-bit errors simultaneously. Correcting each error locally on the same DRAM die, instead of accessing other dies in the DRAM stack, improves the latency and increases the pin data rate from their previous generation of 5Gb/s/pin to 8.0Gb/s/pin with a total memory bandwidth of 1024GB/s per memory cube. This was demonstrated in a 16GB DRAM module.



*Figure 6.* Samsung's HBM3 chip microphotographs of (a) core-die and (b) buffer-die, (c) measured tCK shmoo.

#### **SRAM Memory**

"Energy-Efficient High Bandwidth 6T SRAM Design on Intel 4 CMOS Technology" – Intel (paper C24-1)

Authors from Intel present energy-efficient SRAM in Intel's 4nm class CMOS technology. The drive towards energy-efficient compute for high throughput applications presents challenges of higher capacity and bandwidth for on-die memories. Conventional 6T SRAM serve low area requirements whilst 8T bit cells offer lower dynamic power but neither satisfies both. This paper presents an optimized 6T SRAM design at  $0.03\mu m^2$  bitcell area with a similar power to 8T and a 5.6x dynamic power reduction from the conventional 6T design.





| Bitcell | Fin Ratio Bitcell Are |                        |  |
|---------|-----------------------|------------------------|--|
| 6T HDC  | 1:1:1                 | 0.0240 µm <sup>2</sup> |  |
| 6T HCC  | 1:2:2                 | 0.0300 µm²             |  |
| 8T SRAM | 1:1:1-3:3             | 0.0360 µm <sup>2</sup> |  |

(ŀ

| (c)                                    | 8T-LSA<br>(8T-SRAM bitcell)     | 6T-CONV<br>(HCC bitcell)        | 6T-HBW<br>(HCC bitcell)         |
|----------------------------------------|---------------------------------|---------------------------------|---------------------------------|
| Array Efficiency,<br>Density           | 52%,<br>13.7 Mb/mm <sup>2</sup> | 75%,<br>23.8 Mb/mm <sup>2</sup> | 61%,<br>19.4 Mb/mm <sup>2</sup> |
| Overall Macro Area [1]                 | 1.74 x                          | 1 x                             | 1.23 x                          |
| Read Energy / access <sup>[1,2]</sup>  | 1 x                             | 5.81 x                          | 1.03 x                          |
| Write Energy / access <sup>[1,2]</sup> | 1 x                             | 11.9 x                          | 1.47 x                          |
| Bitcell Leakage                        | 6 x                             | 1 x                             | 1 x                             |

<sup>[1]</sup>Output muxes are excluded from area and energy calculations

 $^{[2]}$  Read and write energy measurements are based on 50% '0' and 50% '1' array data for both read and write

*Figures 1 & 5.* Die micrograph (a) of Intel 4 CMOS technology test chip. (b) Intel 4 CMOS technology memory bitcells. (c) Comparison of density and energy per access for 60kB macro implementation.

#### **SRAM Memory**

"Co-Optimization of SRAM Circuits with Sequential Access Patterns in a 7nm SoC Achieving 58% Memory Energy Reduction for AR Applications" – Meta (Paper C24-3)

AR applications demand ultra-low power consumption for sensors with edge intelligence. In this paper, the Meta Reality Labs team describe the low power design of SRAM in a 7nm SOC embedded in a prototype electromyography (EMG) wristband for AR glasses gesture recognition tasks. They propose specific sequential operations, instead of random access, for the memory design optimized for the sensing modality where the number of circuit operations are minimized for consecutive memory reads and writes. This design achieves 52% lower read and 58% lower write power than a conventional memory design baseline.



**Figure 1.** (a) Meta's AR application analysis of AR SOC SRAM data access showing high sequentiality (b) Chip photo and demo setup. Measured energy at  $124\mu$ J per 'pinch and release' hand gesture.

#### System-in-Package Power Management

"Fully Integrated Voltage Regulators with Package-Embedded Inductors for Heterogeneous 3D-TSV-Stacked System-in-Package with 22nm CMOS Active Silicon Interposer Featuring Self-Trimmed, Digitally Controlled ON-Time Discontinuous Conduction Mode (DCM) Operation" – Intel (Paper C22-1)

Intel reports their advanced system-in-package (SiP) power management offering integration of heterogenous chiplets with their 3D-TSV-stacked system-in-package (SiP) design on a 22nm active silicon interposer. The authors embed inductors in the package substrate connected with 3D-TSVs directly to fully integrated voltage regulators on the interposer die in a tiled form. The power efficiency is shown to be flat across a 10mA – 300mA for a single tile and up to 1A range by selective ganging of regulators on neighboring tiles. This offers flexible & cost-effective integration of a variety of compute, memory & communication chiplets with varying power requirements.



*Figure 2.* Intel's 10-tile fully integrated DCM-voltage regulator test setup with various TSVbased inductors for power/efficiency/area tradeoff.

#### **Power Management of Digital Processors**

*"A 3nm GAAFET Analog Assisted Digital LDO with High Current Density for Dynamic Voltage Scaling Mobile Applications" – Samsung (Paper C21-4)* 

The compute demands of mobile SoCs in advanced CMOS nodes with many processing cores are increasing and so increasing the challenge of power management. Authors from Samsung present their solution of analog assisted digital LDOs in 3nm gate all around FET (GAAFET) technology providing high current density power delivery. The design has active supply noise cancellation and fast transient load detection of CPU cores. Their hybrid LDO

design achieves accurate regulation over <1mA to 1.4A load range and only 38mV supply droop for a 1A dynamic load in 1ns.



Figure 3. Samsung's proposed hybrid LDO structure for mobile SoC application.

#### Wireline Transceivers

"A 72GS/s, 8-bit DAC-based Wireline Transmitter in 4nm FinFET CMOS for 200+Gb/s Serial Links" – IBM Research, USA (Paper C3-2)

IBM reports a transmitter for ultra high speed serial electrical links addressing the needs of ever increasing network bandwidth in data centers. Their approach is based on an 8-bit DAC with 72GS/s operation and source-series termination (SST) topology. Previous works based on the SST design have not surpassed 56GBaud where this work pushes to 72GBaud. Their 4nm FinFET CMOS IC demonstrates 216Gb/s PAM8 and 212Gb/s QAM64 OFDM operation with 288mW power consumption.



*Figure 8.* IBM's measured 72GS/s TX eye diagrams with FFE8 for (a) 144Gb/s PAM4 (b) 180Gb/s PAM6 and (c) 216Gb/s PAM8

#### **5G Transceivers**

"An Ultra-Compact Bidirectional T/R Folded 25.8-39.2GHz Phased-Array Transceiver Front-End with Embedded TX Power Detection/Self-calibration Path Supporting 64/256/512QAM at 28/39GHz band for 5G in 65nm CMOS Technology" -Tsinghua University (Paper C11-4)

Researchers at Tsinghua University demonstrate an area-efficient bi-directional transmitter and receiver for 5G compatible across the wide 28-39GHz frequency range needed for compatibility with the different bands adopted globally. Broadband transformer-based transmitter and receiver introduce fast switching, attenuation, and phase-shifting techniques to realize broadband beamforming. The work supports 64/256/512QAM from 28-39GHz with 19.2dB RX gain and >12.8dBm TX power. They reduce their area by >25% from their prior work.



**Figure 1.** (a) Tsinghua University's proposed ultra-compact bidirectional Tx/Rx folded 28/39GHz broadband phased-array transceiver frond-end with embedded power detection/self-calibration path. (b) Proposed Tx/Rx folded architecture with Tx/Rx reused bidirectional transformer-based high-resolution attenuator. (c) Die micrograph.

#### **5G Transceivers**

"A 39GHz CMOS Bi-Directional Doherty Phased-Array Beamformer Using Shared-LUT DPD with Inter-Element Mismatch Compensation Technique for 5G Base-Station" – Tokyo Institute of Technology (Paper C11-2)

Tokyo Institute of Technology reports a 5G transceiver based on a phased-array beamformer employing a Doherty low noise power amplifier. They investigate improvements in digital & analog correction to improve uniformity of TX power outputs across individual antenna. Their proposed approach uses shared digital correction and individual phase and gain correction for each antenna improving the transmit error vector magnitude (EVM) by up to 9.1% and the transmit to receive EVM up to 11.8%. They show a maximum of 3.5G symbol per second at 64 QAM modulation and the chip also supports 21-Gb/s single-carrier data streaming.



*Figure 1.* Tokyo Institute of Technology's 39-GHz bi-directional Doherty phased-array beamformer with inter-element mismatch compensation.

#### **Imaging & LIDAR**

"1200x84-pixels 30fps 64cc Solid-State LiDAR RX with a HV/LV Transistors Hybrid Active-Quenching-SPAD Array and Background Digital PT Compensation" – Toshiba (Paper C9-2)

The recent size and cost reduction of LIDARs has been recently driven by solid-state VLSI receivers and compact transmitters. Toshiba presents a CMOS-SPAD based LIDAR receiver embedded in a palm-sized 64cc volume system. The 1200x84 sensor embeds optimized active quench SPAD pixels with deep trench insulation (DTI). The generation of high

operating voltage for SPADs can add cost with many off-chip components: this work embeds an on-chip digital background low-voltage control loop compensating for SPAD process and temperature temperature-drift reducing the system bill of materials. The LIDAR system combines the CMOS receiver with a micro scanning mirror, 28ch ADC and FPGA and it demonstrates 3D point cloud generation outdoors in 110kLux bright ambient light at 30FPS, at up to 90°C system temperature.



*Figure 4.* (a) Toshiba's palm-sized proof of concept LiDAR and its block diagram. (b) The 3D point cloud data at 25°C and (c) at 90°C with the proposed DSCC OFF, and (d) DSCC ON.

## Imaging & LIDAR

"A Hybrid Indirect ToF Image Sensor for Long-Range 3D Depth Measurement Under High Ambient Light Conditions" – Toppan (Paper C5-2)

High resolution indirect time of flight (ToF) image sensors for 3D depth cameras and LIDARs conventionally trade-off distance range or precision. Toppan Inc., in collaboration with Brookman Technology and researchers at Shizuoka University, propose a new timing scheme for indirect ToF which breaks this trade-off achieving long distance and high precision. The sensor technology is suitable for a wide range of applications, including outdoor use, and so they propose an interference suppression technique allowing multiple cameras to simultaneously operate in the same field of view. They demonstrate the techniques in a VGA sensor achieving 30m range imaging whilst maintaining <15cm precision with up to 100Lux ambient light.



*Figure 5.* Toppan's Hybrid TOF: Outdoor depth maps in the range of 1–20 m at day (100k lux) and night (< 1 lux).

## **Analog to Digital Converters**

"An 8-bit 56GS/s 64x Time-Interleaved ADC with Bootstrapped Sampler and Class-AB Buffer in 4nm CMOS" – IBM Research, Switzerland (Paper C19-1)

IBM Research in Switzerland reports a 56GS/s ADC at 8b resolution in advanced 4nm CMOS. Modern ADC-based receivers for high-speed serial links rely on time-interleaving to reach required speeds of 112Gb/s and above. This work interleaves 64 ADC channels with analog foreground calibration for inter-channel offset, gain, and skew correction. There is a novel Class-AB input buffer and bootstrapped track and hold sampler that does not require a high supply voltage and operates on a single low supply voltage of 0.8V compatible with the 4nm technology node but still maintaining 0.8V peak to peak input swing. The design achieves performance in line with state of the art at 47fF/conversion step energy efficiency at >27GHz bandwidth.



**Figure 5.** IBM's die micrograph and layout details of their 56 GS/s 8-bit asynchronous SAR ADC fabricated in 4nm CMOS technology. The 16x4 interleaved ADC uses a novel bootstrapping technique and a class-AB follower in the 1st rank interleaver.