# TIP SHEET – VLSI TECHNOLOGY SYMPOSIUM

# Technical Highlights of the 2017 Symposium on VLSI Technology

#### Technology platform papers:

- T6-1 Highly Manufacturable 7nm FinFET Technology Featuring EUV Lithography for Low Power and High Performance Application (Samsung Electronics)
- T6-2 10nm High Performance Mobile SoC Design and Technology Co-Developed for Performance, Power, and Area Scaling (Qualcomm)
- T6-3 First Demonstration of Flash RRAM on Pure CMOS Logic 14nm FinFET Platform Featuring Excellent Immunity to Sneak-path and MLC Capability (National Chiao Tung University, National Taiwan Normal University, UMC)

#### Papers to advance technology using novel materials:

- T6-4 First Demonstration of 3D SRAM through 3D Monolithic Integration of InGaAs n-FinFETs on FDSOI Si CMOS with Inter-layer Contacts (IBM Research GmbH Zürich, CEA-Leti)
- T9-1 High Performance and Record Subthreshold Swing Demonstration in Scaled RMG SiGe FinFETs with High-Ge-Content Channels Formed by 3D Condensation and a Novel Gate Stack Process (IBM)
- T12-1 Nano-scaled Ge FinFETs with Low Temperature Ferroelectric HfZrOx on Specific Interfacial Layers Exhibiting 65% S.S. Reduction and Improved ION (National Nano Device Laboratories, National Cheng Kung University, National Chiao Tung University, National Sun Yat-Sen University, Industrial Technology Research Institute, National Applied Research Laboratories)

#### Papers to advance technology for non-conventional systems:

- JFS3-3 Performance Boost of Crystalline In-Ga-Zn-O Material and Transistor with Extremely Low Leakage for IoT Normally-Off CPU Application (United Microelectronics Corporation [UMC] and Semiconductor Energy Laboratory)
- T2-3 A Low-Power Cu Atom Switch Programmable Logic Fabricated in a 40nm-node CMOS Technology (NEC)
- T13-1 Towards Quantum Computing in Si MOS Technology: Single-shot Readout of Spin States in a FDSOI Split-Gate Device with Built-in Charge Detector (Institut Neel, CEA Leti, CEA INAC-PHELIQS)

#### Papers to advance heterogeneous integration:

- T5-1 Wafer Level Integration of an Advanced Logic-Memory System through 2nd Generation CoWoS Technology (TSMC)
- T8-1 Towards a Fully Integrated, Wirelessly Powered, and Ordinarily Equipped On-lens System for Successive Dry Eye Syndrome Diagnosis (National Chiao Tung University)

# T6-1 Highly Manufacturable 7nm FinFET Technology Featuring EUV Lithography for Low Power and High Performance Application (Ha et al., Samsung Electronics)

**Samsung Electronics** will present a 7nm CMOS technology utilizing EUV lithography as well as 4th generation dual Fin and 2nd generation multi-eWF gate stack for low power and high performance applications, demonstrating speed improvement of 20% or a 35% reduction in power compared to 10nm technology. EUV lithography is fully adopted for MOL contacts and minimum-pitched metal/via interconnects, which enables >25% mask steps reduction, higher pattern fidelity and smaller CD variation. Low voltage functionality of HD SRAM test chip verifies the technology with EUV lithography, thanks to  $A_{VT}$  of 1.29 for PD (PG) and 1.34 for PU.



**LEFT:** EUV lithography is fully adopted for Middle-Of-Line (MOL) contacts and minimum pitched metal/via interconnects. >25% mask steps can be reduced compared with advanced DPT using ArF immersion lithography (Fig. 1).

**RIGHT:** EUV lithography provides ~70% higher fidelity, giving better corner rounding profile and CD variation (Fig. 4).

#### T6-2 10nm High Performance Mobile SoC Design and Technology Co-Developed for Performance, Power, and Area Scaling (Sam Yang et al., Qualcomm)

Mobile SoC has become a powerful platform for high performance computing, AI, machine learning, and AR/VR experiences. **Qualcomm** successfully ramped the industry's first 10nm low power high performance mobile SoC in production. To overcome scaling challenges such as increased wiring resistance, variation, and strong layout stress effects at 10nm, design and technology co-development from technology definition to the product ramp stage was carefully adapted. The developed 10nm SoC chip is 16% faster, 37% smaller, and 30% lower power than its 14nm processor.



Total CPU power of 10nm SoC with octa-core is lower than that of 14nm SoC with quad-core at faster speed at the same time, thanks to power efficient design and process co-development (Fig. 1).

#### T6-3 First Demonstration of Flash RRAM on Pure CMOS Logic 14nm FinFET Platform Featuring Excellent Immunity to Sneak-path and MLC Capability (E.R. Hsieh et al., National Chiao Tung University, National Taiwan Normal University, UMC)

**National Chiao Tung University, National Taiwan Normal University, and UMC** will demonstrate flash RRAM technology utilizing a high-k/metal gate stack combined with a 14nm FinFET platform. This RRAM is of a bipolar type with ion vacancy-based operation. New active fin isolation (AFI) technology will be proposed and demonstrated to suppress the sneak-path issue. Thanks to this technology, the S/N margin is greatly improved by 3 orders of magnitude. Compared to a conventional AND-type memory cell, 30% reduction of standby power, and 99% reduction of active power are obtained.



A unit cell consisting of 2 identical FinFETs in series, providing the functionality of embedded Flash. (a) Unit cell (b) Cross-section (Fig. 2)

#### T6-4 First Demonstration of 3D SRAM through 3D Monolithic Integration of InGaAs n-FinFETs on FDSOI Si CMOS with Inter-layer Contacts (V. Deshpande et al., IBM Research GmbH Zürich, CEA-Leti)

**IBM and CEA-Leti** will present a 3D monolithic integration of InGaAs n-FinFET on FDSOI CMOS featuring short-channel replacement metal gate (RMG) InGaAs on the top layer and gate-first Si CMOS on the bottom layer with TiN/W inter-layer contacts. State-of-the-art device integration is achieved with the top layer InGaAs utilizing raised source drain (RSD) and the bottom layer CMOS having Si RSD for nFETs and SiGe RSD for pFETs. The top layer InGaAs ni-FinFETs are scaled down to Lg = 25nm and both the Si nFETs and pFETs in the bottom layer are scaled down to Lg = 15nm. A densely integrated 3D 6T-SRAM circuit with InGaAs nFET stacked on top of Si pFET shows considerable area reduction with respect to a 2D layout.



Cross-section TEM images show: (a) InGaAs nFET on SOI pFET, (b) 22nm Lg InGaAs nFET, (c) 13nm Lg Si pFET, (d) 25nm InGaAs fin and (e) top-view of 2D and 3D inverter (Fig. 3)

### T9-1 High Performance and Record Subthreshold Swing Demonstration in Scaled RMG SiGe FinFETs with High-Ge-Content Channels Formed by 3D Condensation and a Novel Gate Stack Process (P. Hashemi et al., IBM)

**IBM** will demonstrate scaled high-Ge-content (HGC) strained SiGe pMOS FinFETs with very high short channel performance using a replacement high-k/metal gate (RMG) flow. The fabrication processes include 3D Ge condensation for fin formation, a two-step Ge-free interfacial layer (IL) without a Si cap for advanced gate stacks and the formation of an ultra-thin spacer and improved S/D. Excellent reliability and near ideal swing with record SS as low as 62mV/dec are presented for the new gate stack with Ge-free IL. Also, an improved I/I free process with the ultra-thin spacers realizes considerable Ron and Rext reduction. As a result, record high SiGe pMOS performance with Ion of -0.45 mA/\_m and Lg down to 25nm is demonstrated, highlighting the suitability of RMG HGC SiGe FinFETs for further HP applications at scaled VDD of -0.5V.



LEFT: XTEM images of RMG HGC SiGe pFETs (Fig. 3)

**RIGHT:** Transfer characteristics of a RMG SiGe FinFET with Lg~25nm (Fig. 13)

 T12-1 Nano-scaled Ge FinFETs with Low Temperature Ferroelectric HfZrOx on Specific Interfacial Layers Exhibiting 65% S.S. Reduction and Improved ION (C.–J. Su et al., National Nano Device Laboratories, National Cheng Kung University, National Chiao Tung University, National Sun Yat-Sen University, Industrial Technology Research Institute, National Applied Research Laboratories)

A steep slope transistor utilizing an HfO2-based ferroelectric gate insulator has been attracting attention as an ultralow power transistor. **National Nano Device Laboratories, National Cheng Kung University, National Chiao Tung University, National Sun Yat-Sen University, and Industrial Technology Research Institute** have demonstrated Ge n- and p-FinFETs with different interfacial layer ferroelectric HfZrOx (IL-FE-HZO) gate stacks by systematically investigating annealing conditions. Microwave annealing not only shows enhanced FE characteristics but also suppresses the gate leakage and Ge interdiffusion compared with conventional rapid thermal annealing. High ION/IOFF (>10<sup>7</sup>) and low subthreshold slope (S.S. ~ 58mV/dec) are demonstrated by a Ge nFinFET with a gate length of 60nm and an FE-HZO/GeOx gate stack.



**LEFT:** Cross sectional TEM image of the fabricated Ge FinFET with ferroelectric HfZrOx gate stack (Fig. 4).

**RIGHT:** Measured Id-Vg characteristics of the fabricated Ge N-FinFET of fin width 20nm and gate length 60nm (Fig. 15).

JFS3-3 Performance Boost of Crystalline In-Ga-Zn-O Material and Transistor with Extremely Low Leakage for IoT Normally-Off CPU Application (Shao Hui Wu et al., United Microelectronics Corporation (UMC) and Semiconductor Energy Laboratory)

**UMC and Semiconductor Energy Laboratory** improve the  $I_{ON}$  of the IGZO channel FET from 4.7µA to 9µA by improving the mobility of the IGZO channel. This improved IGZO and double finger structure are adopted to the normally-off CPU. The double finger structure is applied to increase channel width W while suppressing S factor degradation. Thanks to those combinations, the normally-off CPU can be operated with a frequency of 100MHz. In addition to the above normally-off CPU, this IGZO channel FET is also adopted to FPGA. It can be operated with a frequency of 360MHz.



LEFT: Cross-section of hybrid 65nm SiFET and 60nm OSFET (Fig. 1)

**RIGHT:** Comparison of Hall mobility between new and conventional IGZO. The mobility of new IGZO is almost two times that of the conventional IGZO (Fig. 2).

# T2-3 A Low-Power Cu Atom Switch Programmable Logic Fabricated in a 40nm-node CMOS Technology (X. Bai et al., NEC)

**NEC** will demonstrate a nonvolatile programmable logic (NPL) based on a CAS (complementary atom switch) integrated into a 40nm-node CMOS showing 2x logic density, 3.8x operation speed, and 3x power efficiency as compared to the commercial low power PL with only CMOS. This presentation also reports superior scalability and improved programming characteristics for the CAS thanks to reduced programming voltage allowing replacement of high-voltage programming transistors with core transistors.

|                                                                             |                                                                          | This work   | Commercial  |
|-----------------------------------------------------------------------------|--------------------------------------------------------------------------|-------------|-------------|
| (a) bright-field TEM (b) Polymer-Solid<br>Electrolyte (PSE) ON<br>Cu bridge | Switch                                                                   | Atom switch | Pass Tr.    |
|                                                                             | Process node                                                             | 40 nm       | 40 nm       |
|                                                                             | Number of LUTs                                                           | 6400        | 1280        |
|                                                                             | Logic density [mm <sup>-2</sup> ]<br>(= No. of 4-input -LUT<br>per Area) | 2532        | 1320        |
| PSE' Cu T'                                                                  | Max. Speed at 0.8V                                                       | 27 MHz      | 7.1 MHz     |
|                                                                             | VDDmin at 15MHz                                                          | 0.675 V     | 0.94 V      |
|                                                                             | Dynamic power at<br>VDDmin                                               | 13 µW/MHz   | 39.5 µW/MHz |
| Programming Tr. ← (ŢŢ)                                                      | Active power at VDDmin                                                   | 386 µW      | 630 µW      |

**LEFT**: Nonvolatile complementary atom switch (CAS) in 40nm-node process (a) TEM images, (b) schematic image of the ON/OFF state (Fig. 2).

**RIGHT:** Performance comparison (Application: ALU). All the logic density, speed and power efficiency of the atom switch NPL using core transistors are greatly improved compared with the commercial PL (Table I).

#### T13-1 Towards Quantum Computing in Si MOS Technology: Single-shot Readout of Spin States in a FDSOI Split-Gate Device with Built-in Charge Detector (M. Urdampilleta et al., Institut Neel, CEA LETI, CEA INAC-PHELIQS)

**Institut Neel, CEA LETI, and CEA INAC-PHELIQS** will demonstrate real-time monitoring of a single spin in a quantum dot using foundry-compatible Si MOS technology and split-gate design with a built-in charge detector. Since a single-shot readout is an indispensable step to pursuit of Si-based fault-tolerant quantum computing, this work contributes by proposing the fabrication of Si spin qubits in a MOS technology platform as a viable and promising option.



A split-gate device and built-in non-invasive detector. Quantum dots containing spin information are formed in the mesa corners, controlled by the wrapping gates. Due to the electrostatic landscape in the channel, a (green) SET forms between gates, and is capacitively coupled to both QDs (Fig. 2).

# T5-1 Wafer Level Integration of an Advanced Logic-Memory System through 2nd Generation CoWoS Technology (TSMC)

**TSMC** developed a CoWoS-2 WLSiP technology that integrated VLSI SoC up to six 8-high HBM2 with suppressed warpage resulting in high package yield. An ultra-large Si interposer up to 1200 mm<sup>2</sup> made by a two-mask stitching process was used to form the basis of the CoWoS-2. CoWoS-2 has been positioned as a flexible 3D IC platform for logic-memory heterogeneous integration between logic SoC and HBM for various high performance computing applications.



SEM cross-sections of CoWoS-2 components including u-Bump, Si interposer, TSV, C4 Cu bump, substrate, BGA, advanced node SoC, and HBM2 (Fig. 6).

CoWoS-2: 2nd Generation Chip-on-Wafer-on-Substrate HBM2: 2nd Generation High Bandwidth Memory

#### T8-1 Towards a Fully Integrated, Wirelessly Powered, and Ordinarily Equipped On-lens System for Successive Dry Eye Syndrome Diagnosis (National Chiao Tung University)

**National Chiao Tung University** will present a smart contact lens (SCL) sensor system for successive evaluation of tear evaporation. This system is composed of a tear sensor and an antenna, embedded in a biocompatible hydrogel-based contact lens, and a tunable sensitivity sensor-readout circuitry. The on-lens system can be addressed using commercial radio-frequency identification (RFID) reader devices for sensor control and data communication. Subjects can wear the SCL for continuous tear-content monitoring.



On-lens sensor system (a) SCL-readout system, (b) SCL integrated with a sensor, an antenna, and a sensor chip (Fig. 1).



Photos of the SCL system embedded in a soft contact lens using a wrinkle-free cast-molding technique (Fig. 13).

# TIP SHEET – VLSI CIRCUITS SYMPOSIUM

# Technical Highlights of the 2017 Symposium on VLSI Circuits

## Processors

Processors in the IoT age face increasingly complex arithmetic requirements for security and artificial intelligence. Modern cryptographic standards require even low-end IoT processors to perform complex cryptographic operations using a small amount of energy. Artificial intelligence applications demand efficient and compact implementations of inference and recognition procedures. The following three papers present techniques to meet such requirements through the use of algorithms and power efficient design.

#### Recryptor: A Reconfigurable In-Memory Cryptographic Cortex-M0 Processor for IoT

Michigan University presents a cryptographic processor which is reconfigurable and capable of performing various cypher algorithms faster and with less power compared to state-of-the-art software and hardware implementations. A programmable in-memory calculation block, embedded into a commercial ARM Cortex-M0 processor, accelerates wide bit-width arithmetic operations common in cryptography algorithms and standards. Implemented in 40nm CMOS, the processor demonstrates 6.8 times faster operation with 12.8 times less energy than previous software and hardware-accelerated implementations.

(Paper C20-1—"Recryptor: A Reconfigurable In-Memory Cryptographic Cortex-M0 Processor for IoT," et al., University of Michigan)



Fig.2. Proposed Crypto-SRAM Bank (CSB).

#### Fig. 2

This figure shows an in-memory calculation block called the crypto-SRAM bank (CSB). With this block, wide bit-width data are easily read from SRAM and soon processed and written back to SRAM according to cypher algorithms.

#### BRein Memory: A 13-Layer 4.2 K Neuron/0.8 M Synapse Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator in 65nm CMOS

Deep neural networks (DNNs) have attracted attention. The enormous amount of required computation and memory access leads to a decrease in area/power/energy efficiency. Recent DNN accelerators are custom-designed to achieve high performance and energy efficiency, but as a result they sacrifice versatility. Hokkaido University, Tokyo Institute of Technology, and Keio University propose an accelerator for DNNs in 65nm CMOS. In this accelerator, processing elements called PIM (processing-in-memory) modules are arranged in a reconfigurable array, similar to an FPGA, which allows it to emulate a wide variety of DNNs. Moreover, each PIM module is designed to map binary/ternary DNNs, which drastically reduce their memory and computation requirements with only slightly degraded accuracy. The proposed chip achieves 1-2 and 2-4 orders of magnitude better performance

and energy efficiency, respectively, compared with CPU, GPU, or FPGA implementations, and achieves 1.4TOPS, outperforming recent CNN accelerators.

(Paper C2-1—"BRein Memory: A 13-Layer 4.2 K Neuron/0.8 M Synapse Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator in 65nm CMOS," et al., Hokkaido University, Tokyo Institute of Technology, and Keio University)



#### Fig. 3

Output-/input-parallel neural engines for ternarization and biasing

# A 12.4pJ/cycle sub-threshold, 16pJ/cycle near-threshold ARM Cortex-M0+ MCU with autonomous SRPG/DVFS and temperature tracking clocks

ARM will demonstrate a near-threshold operable ARM Cortex M0+ MCU for IoT applications. The Cortex M0+ is usually used in battery powered devices, such as wireless sensor nodes, and therefore low-power operation is crucial. The MCU reduces the active energy to 12.44pJ/cycle and power consumption to 139.4nW standby, which are roughly halved when compared to prior work demonstrated by ARM. These performances are achieved by both state-retention power gating (SRPG) and dynamic voltage and frequency scaling (DVFS). To efficiently use these techniques for sub-threshold voltage operation, the clock frequency is automatically adjusted with the operating temperature, which has a large impact on the maximum operating frequency in sub-threshold operation. The clock

frequency is adjusted by tuned clock ring oscillator (TCRO). The operation and power consumption are confirmed by running EEMBC's ULPBench, which is used to measure the low-power IoT workloads.

(Paper C26-2—"A 12.4pJ/cycle sub-threshold, 16pJ/cycle near-threshold ARM Cortex-M0+ MCU with autonomous SRPG/DVFS and temperature tracking clocks," et al., ARM Ltd.)



### Fig. 1

Block diagram of MCU chip and TCRO for clock adjustment

# Memory

# A 0.3V VDDmin 4+2T SRAM for Searching and In-Memory Computing Using 55nm DDC Technology

University of Michigan (Ann Arbor) and Fujitsu Semiconductor America, Inc. will present a 0.3V VDDmin 4+2T SRAM for searching and in-memory computing. The strong body effect

in their deeply depleted channel (DDC) technology allows the memory cell to use the N-well as the write wordline. This makes it possible to eliminate the two conventional access transistors between the memory cell and bitline. It also allows the use of two transistors dedicated to differential read, resulting in reliable multi-word activation for in-memory Boolean logic functions as well as a low VDDmin (=0.3V) array operation. The SRAM can be configured as a BCAM or TCAM with enabling search operations.

(Paper C12-2—"A 0.3V VDDmin 4+2T SRAM for Searching and In-Memory Computing Using 55nm DDC Technology," et al., University of Michigan and Fujitsu Laboratories, Ltd.)



Fig. 1

4+2T SRAM memory cell with N-well as the write wordline and decoupled read/write path

## Biometric and Sensors

### A 1.06 uW Smart ECG Processor in 65nm CMOS for Real-Time Biometric Authentication and Personal Cardiac Monitoring

An ECG (electrocardiogram) measures electrical signals associated with the activities of heart muscle cells and contains rich information about heart disorders such as arrhythmia. Daily monitoring is required for patients suffering from a heart disease, but doing so is difficult because the weak signal requires special equipment and resting conditions.

A wearable ECG device meets this demand. In addition to medical applications, it is promising as a monitoring device for healthcare and sports science. One critical requirement for wearable ECG devices is low power consumption; the data size of the raw ECG signal is so large that low-power signal processing or data compression is required to reduce power consumption in wireless data transmission. Another concern is the security of transmitting personal information such as an ECG signal, a common issue for wearable devices with wireless data transmission.

Arizona State University and Samsung developed a low-power smart ECG processor designed to perform ECG-based biometric authentication, arrhythmia detection, and abnormal ECG pulse shape detection (i.e., anomaly detection), as shown in Fig. 1. Since the ECG signal is unique to each person, it can be used as biometric authentication. Compared to prior work, this work is the first ASIC for ECG authentication, and it can achieve a very low error rate when using the same neural network learning algorithm applied to 645 subjects.

A 1.06  $\mu$ W ECG processor realized in a 65nm low-power CMOS process has been measured to perform ECG-based biometric authentication, arrhythmia detection, and anomaly detection with a supply voltage of 0.55V and 2 kHz clock frequency. A data-driven sparsity enhancement method by Lasso regression is used to compress the neural network weights while maintaining a superior error rate.

(Paper C9-1—"A 1.06 uW Smart ECG Processor in 65nm CMOS for Real-Time Biometric Authentication and Personal Cardiac Monitoring," et al., Arizona State University, Samsung Research Center-Beijing, and Samsung Advanced Institute of Technology)



#### Fig. 1

The group of Arizona State University, Samsung Research Center, and Samsung Advanced Institute of Technology has developed a low-power smart ECG processor that can perform ECG-based biometric authentication, arrhythmia detection, and anomaly detection, and can achieve a very low error rate based on a neural network learning algorithm. FIR: finite impulse response.

## A Fully Integrated Closed-Loop Neuromodulation SoC with Wireless Power and Bidirectional Data Telemetry for Real-Time Human Epileptic Seizure Control

National Chiao Tung University demonstrated a fully integrated and wirelessly powered (implantable) epileptic seizure detection and suppression system on chip (SoC) for treating human epilepsy, a common neurological disorder that affects about 1% of the world population today. This neuromodulation SoC detects the seizure onset through 16-channel recorded ECoG signal from the brain and generates stimulus pulses to suppress the epileptic seizure. It achieves detection accuracy of 97.76%, which is the highest ever in recently-reported neuromodulation SoCs.

(Paper C4-1—"A Fully Integrated Closed-Loop Neuromodulation SoC with Wireless Power and Bidirectional Data Telemetry for Real-Time Human Epileptic Seizure Control," et al., National Chiao Tung University, National Cheng Kung University, Chung Shan Medical

VLSI Symposia on Technology and Circuits

University Hospital, Kaohsiung Chang Gung Memorial Hospital, and Chang Gung University College of Medicine)



### Fig. 1

National Chiao Tung University integrated all components required for closed-loop neuromodulation, including the analog signal acquisition front-end, bio-signal processor, adaptively-controlled neuron stimulator, and wireless-power delivery unit in 5x5 mm2 SoC in 0.18um CMOS.

## A 4.1Mpix 280fps Stacked CMOS Image Sensor with Array-Parallel ADC Architecture for Region Control

Stacked CMOS image sensors (CISs) continue to enhance functionality and user experience in mobile devices. Stacking enables the parallelization of signal processing and the integration of advanced signal processing. Moreover, hetero-process technology integration and the utilization of global shutter pixels have been introduced. However, frontilluminated global shutter and column parallel ADC architectures have difficulty achieving high sensitivity and flexible region of interest (ROI) readout, respectively. Sony presents an ROI-controllable image sensor to reduce data bandwidth and power consumption for surveillance and factory automation applications. Flexible ROI control readout without image distortion is realized utilizing array-parallel ADC architecture, saving ADC power adaptively. In addition, low dark random noise of 4.2e-rms is achieved using active reset and frame CDS operation with floating diffusion (FD)-based back-illuminated global shutter.

(Paper C19-1—"A 4.1Mpix 280fps Stacked CMOS Image Sensor with Array-Parallel ADC Architecture for Region Control," Tomohiro Takahashi et al., Sony Semiconductor Solutions Corporation, Sony LSI Design Inc., and Sony Electronics Inc.)



Fig. 1 and Fig. 4

The above figure shows an array-parallel ADC architecture to enable an intelligent sensor system of ROI.

## A 10.1" 56-Channel, 183 uW/electrode, 0.73 mm2/sensor High SNR 3D Hover Sensor Based on Enhanced Signal Refining and Fine Error Calibrating Techniques

Performance of today's touch panels is limited by the low sensitivity to a touched object. This is due to the sensing of mutual capacitance of the panel electrodes, which requires a large capacitance between the object and the electrode, i.e., an object touching the panel. As a result, the sensitivity is low and the panel is only capable of 2D position sensing. To address this issue, a group of researchers from Korea Advanced Institute of Science and Technology (KAIST) and Samsung Electronics Co., Ltd. proposes a new 3D hover sensing circuit targeting touch panels of future mobile phones. The proposed circuit is based on a self-capacitance sensing scheme (SCSS) that senses capacitance variation of the touching object itself and can detect 3D hovering, resulting in higher sensitivity and finer touchposition resolution. A high signal-to-noise ratio (SNR) is achieved by an electrode grouping and profile tuning method. Notably, while conventional SCSS circuits suffered from high power consumption due to the need for panel offset cancellation circuits, the group is successful in automatic cancellation of the offset by separating the driving circuits and the sensing circuits, saving considerable power and the die area. As a result, a very high SNR (39dB) and low power consumption (183uW/electrode) are achieved.

(Paper C24-1—"A 10.1" 56-Channel, 183 uW/electrode, 0.73 mm2/sensor High SNR 3D Hover Sensor Based on Enhanced Signal Refining and Fine Error Calibrating Techniques," et al., Korea Advanced Institute of Science and Technology [KAIST] and Samsung Electronics Co., Ltd.)



### Fig. 3

The above figure shows schematic comparison of conventional "2D Touch" (left) and "3D Hover" (right) sensors. The conventional one can sense only when the capacitance variation between the object (finger) and the touched panel ( $\Delta$ Cn+2) becomes large, thus the object must be touched on the panel. The proposed sensor senses the capacitance variations of the object with respect to each panel even when the object is located at a distance from the panel enabling 3D hover sensing. Although each capacitance variation signal is relatively weak and difficult to use for threshold determination as shown in the signal profile (inset), a high SNR was achieved by grouping the signals of nearby panels.

## Power Conversion

### A Digitally Controlled Fully Integrated Voltage Regulator with 3D-TSV-Based On-Die Solenoid Inductor with Backside Planar Magnetic Core in 14nm Tri-Gate CMOS

Intel will present a fully integrated digitally controlled buck voltage regulator (VR) in 14nm tri-gate CMOS that has achieved state-of-the-art power conversion efficiency. The target application is TSV-based 3D stacked heterogeneous multi-die packages, where stringent thermal constraints necessitate local VRs in each of the dies to operate at high power conversion efficiency in the light load regime and TSV-friendly area-efficient inductor integration is desired. Intel built an on-die solenoid inductor that uses 4.5 TSV-based vertical turns around the die along with a high-permeability planar magnetic core on the backside. The inductance density is improved to 111nH/mm2, which is >2X better than conventional inductors with non-planar magnetic cores, and >8X better than planar spiral inductors. The regulator generates a 0.4V-1.1V output from a 1.2V input and high power conversion efficiency (77%) for light load condition (1.5mA) is achieved through the use of hysteretic and pulse frequency modulation control.

(Paper JFS2-1—"A Digitally Controlled Fully Integrated Voltage Regulator with 3D-TSV-Based On-Die Solenoid Inductor with Backside Planar Magnetic Core in 14nm Tri-Gate CMOS," H. K. Krishnamurthy et al., Intel Corporation)



### Fig. 1 and Fig. 2

The voltage regulator chip micrograph and different views of TSV-based on-die inductor in 14nm tri-gate CMOS technology

## Analog

## A Capacitively-Degenerated 100dB Linear 20-150MS/s Dynamic Amplifier

Broadcom Corporation and Delft University of Technology present a new dynamic residue amplifier for pipelined ADCs with an input of 100mVpp,diff and 4x gain, achieving -100dB THD, the lowest ever reported in dynamic amplifiers. Residue amplification in pipelined ADCs often relies on closed-loop amplifiers, which require large bandwidth. In contrast, dynamic amplifiers are open-loop and more power-efficient but are more nonlinear. This design proposes a dynamic amplifier that employs a capacitively-degenerated linearization technique, which uses only slow digital nonlinearity detection and adjusts an analog control voltage to ensure excellent linearity performance with negligible power overhead.

(Paper C11-1—"A Capacitively-Degenerated 100dB Linear 20-150MS/s Dynamic Amplifier," et al., Broadcom Corporation and Delft University of Technology)



### Fig. 1

In this work, the non-linearity associated with sampling the amplifier's output at times other than topt is detected off-chip. Subsequently, the bias current IB is adjusted to tune topt over PVT, thus providing a simple non-linearity correction knob.

## Converter

#### A 16nm 69dB SNDR 300MSps ADC with Capacitive Reference Stabilization

Imec will demonstrate a 16nm 69dB SNDR 300MSps pipelined SAR ADC with a unique scheme to cancel reference voltage ripple due to internal DAC switching. Conventional switching of the capacitive DAC at each SAR conversion step draws a signal-dependent charge from the reference and causes large harmonic distortion, one of the important ADC performances. As a result, SAR ADCs require decoupling capacitors and/or reference buffering at the expense of significant power consumption and area. With the proposed stabilization scheme, selecting the appropriate value for the auxiliary capacitor (Caux) per DAC code makes the sum of the drawn charge from the reference constant. Any signal-dependent ripple on the reference voltage can be eliminated with only a small power and area cost.

(Paper C8-1—"A 16nm 69dB SNDR 300MSps ADC with Capacitive Reference Stabilization," et al., Imec)



#### Fig. 2

This figure shows the implementation of a reference stabilization scheme with a timing diagram. A look-up table (LUT) maps each DAC code to the correct setting for Caux. A

calibration engine updates the values of LUT accordingly by monitoring the output of a comparator (Ref. cmp).

# Wireless Communication and Clock Generation

## A 100mW 3.0 Gb/s Spectrum Efficient 60GHz Bi-Phase OOK CMOS Transceiver

Tokyo Institute of Technology and Samsung Electronics present a high-data-rate spectrumefficient 60GHz wireless transceiver for indoor, short-range IoT application. Compared with conventional on-off-keying (OOK) or bi-phase-shift-keying (BPSK), the newly proposed biphase on-off-keying (BPOOK) improves spectrum efficiency and makes it possible to double the data rate within the same spectrum bandwidth. A data rate of 3.0 Gb/s is achieved while complying with the spectral mask of the IEEE 802.11ad (WiGig) standard, even in a poor channel environment and supported by a single carrier BPSK mode (1.76 Gb/s) in 802.11ad. The proposed modulation scheme enables the use of a simple incoherent demodulator in the receiver for low power operation. The total power consumption of the transceiver can be reduced to 100mW, which is a 60% reduction of power compared to a conventional transceiver.

(Paper C23-1—"A 100mW 3.0 Gb/s Spectrum Efficient 60GHz Bi-Phase OOK CMOS Transceiver," et al., Samsung Electronics, Tokyo Institute of Technology and Samsung Electronics Co., Ltd.)



### Fig. 5 (a) and (b)

These figures show the measured transmitter spectrum of conventional on-off-keying (a) and the proposed bi-phase on-off-keying (b). The spectrum bandwidth of the proposed

BPOOK is about half of the conventional OOK with the same data rate, and 3.0 Gb/s is realized within a spectrum mask of IEEE 802.11ad/WiGig.

# A 0.5V 1.6mW 2.4GHz Fractional-N All-Digital PLL for Bluetooth LE with PVT-Insensitive TDC Using a Switched-Capacitor Doubler in 28nm CMOS

In IoT applications powered by energy harvesters or small batteries, an SoC operating with a single low supply voltage is essential. Frequency generation that supports the full requirements of Bluetooth Low Energy (BLE) while operating at sub-1V has been a longdesired goal. A group of researchers from TSMC, and University College Dublin will demonstrate an ultra-low-voltage PLL operating at a single supply voltage of 0.5V for Bluetooth Low Energy. 0.5-V analog blocks and an internal voltage doubler for digital blocks achieve stable performance against process, voltage, and temperature variations. A prototype implemented in 28nm CMOS achieves 1.6-mW operation with 0.82ps RMS jitter, corresponding to a PLL FoM of -239.2dB.

(Paper C14-1—"A 0.5V 1.6mW 2.4GHz Fractional-N All-Digital PLL for Bluetooth LE with PVT-Insensitive TDC Using a Switched-Capacitor Doubler in 28nm CMOS," et al., Taiwan Semiconductor Manufacturing Company [TSMC] and University College Dublin)



Fig. 3

The above figure shows an AD-PLL for a BLE transceiver, working with a single 0.5V supply.

## Wireline Communication

#### A 32 Gb/s, 4.7 pJ/bit Optical Link with -11.7dBm Sensitivity in 14nm FinFET CMOS

The rapid growth of cloud-computing demands higher bandwidth as well as low-cost communication in data-centers over distances as large as 50 m. For distances beyond 10 m, achieving data-rate >25 Gb/s is very difficult with low-cost copper interconnect due to its media loss and inter-symbol interference (data pattern dependent degradation). On the other hand, an optical link with low-loss optical fiber has difficulty increasing its data-rate due to high frequency noise. IBM overcomes the issue by intentionally limiting the bandwidth of front-end receive circuits to suppress high frequency noise and introducing a decision feedback equalizer (DFE) for non-linear data recovery. The transmitter and receiver chips are implemented in a 14nm FinFET CMOS process and successfully achieve 32 Gb/s optical data transmission with practical -11.7-dBm OMA sensitivity as well as a 1.4-pJ/s top level efficiency.

(Paper C25-1—"A 32 Gb/s, 4.7 pJ/bit Optical Link with -11.7dBm Sensitivity in 14nm FinFET CMOS," et al., IBM Corporation)



#### Fig. 8

The figure shows the BER vs OMA and vs sampling time, and TX eyes. The 32 Gb/s sensitivity at BER = 1e-12 is -11.7dBm with FFE and DFE.

# A 60 Gb/s 1.9 pJ/bit NRZ Optical-Receiver with Low Latency Digital CDR in 14nm CMOS FinFET

Furthermore, IBM and EPFL present a double speed optical link receiver in comparison with the optical link in the previous paper. The 64 Gb/s data receiving operation itself was reported in ISSCC 2017. Here, they add a clock and data recovery (CDR) function to the reported receiver architecture and increase the degree of completion as an optical link receiver. The proposed CDR utilizes a 128-step octagonal phase rotator to obtain high linearity and achieve 0.16UIpp jitter tolerance at 80MHz frequency corner. The chip is fabricated with 14nm FinFET CMOS achieving 63 Gb/s, 7 m optical data reception with 1.9 pJ/bit top level efficiency and with practical -5dBm OMA sensitivity, simultaneously.

(Paper C25-2—"A 60 Gb/s 1.9 pJ/bit NRZ Optical-Receiver with Low Latency Digital CDR in 14nm CMOS FinFET," et al., IBM Corporation and École polytechnique fédérale de Lausanne [EPFL])



#### Fig. 3

The above figure shows phase rotator circuits with 128-step octagonal control.

#### Here are definitions of some important technical terms:

- ADC, or Analog-to-Digital Converter A device that converts a continuous physical quantity (usually voltage) to a digital number.
- **Back-End/BEOL** and **Front-End/FEOL** -- In integrated circuit manufacturing, transistors and other active devices are built first (at the <u>front end of</u> the manufacturing line or FEOL), while the interconnect, or the wiring, is built afterward, at the "<u>back end</u>" of the manufacturing line (BEOL).
- **Bi-Phase On-Off-Keying (BPOOK)** A modulation scheme of data communication. Carrier amplitude is modulated between zero and one depending on the baseband data. Furthermore, carrier phase is also changed between 0 and 180° when the baseband data is "one". Compared with the OOK and BPSK, the spectrum efficiency is improved and data rate can be doubled with the same spectrum bandwidth. Same as the OOK, envelope detector can be used for demodulation and suited for low power operation.
- **Bi-Phase Shift Keying (BPSK)** A modulation scheme of data communication. Carrier phase is modulated between 0 and 180° depending on baseband data. Compared with OOK, receiver sensitivity can be improved by using coherent detector because the distance between signal points are large and required signal-to-noise ratio can be relaxed.
- **Buck Converter** -- is a DC-to-DC power converter which steps down voltage (while stepping up current) from its input (supply) to its output (load). It is a class of switched-mode power supply (SMPS).
- **BLE** Bluetooth Low Energy. Bluetooth is a wireless standard, and BLE is a Low-Energy (LE) mode in Bluetooth for Smartphone, IoT, etc.
- **CDS** (Correlated Double Sampling) Correlated double sampling is a method to cancel the fixed pattern and reset noise in the pixel. During the pixel readout cycle, two sample are taken and subtracted. One signal is taken when the pixel still in the reset state, and the other is taken when the charge has been transferred to the readout node.
- **CMOS/MOS/MOSFET/FET**-- Most transistors today are FETs, or field-effect transistors. Most FETs are built with CMOS manufacturing technology (<u>complementary metal oxide semiconductor</u>). Generically they are called MOSFETs, or sometimes MOS transistors.
- **Compound/III-V Semiconductors** -- Most semiconductors are silicon-based, but researchers continue to investigate other semiconducting materials with higher electron mobilities because they can be used to make faster devices. The tradeoff is that the materials are harder to work with than silicon. Compound semiconductors are made of two or more elements (e.g. GaAs, InP, GaN, etc.) which are generally found in groups III and V of the periodic table of the elements.
- **DAC or Digital-to Analog Converter** A device that converts digital data into an analog signal (current, voltage, or electric charge).
- DNN (Deep Neural Network) Neural network that has more than one layer of hidden units between its inputs and its outputs. Famous models include Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). The idea of realizing higher level functions by a neural network with multiple hidden layers was previously existing, but the convergence in the training using the traditional back propagation method was slow and the performance was insufficient. In recent years, the effectiveness of DNN was rediscovered thanks to the proposal of an effective training algorithm for multilayered neural networks and the significant performance improvement of computers. In addition, DNN has received a great deal of attention at the Image Recognition Contest (ImageNet Large Scale Visual Recognition Challenge) held in 2012 as a result of the overwhelming performance of research teams using DNN. For these reasons, research on utilization of DNN in various fields including image recognition, speech recognition, etc. is currently active. The machine learning algorithm using DNN is called deep learning.
- **DRAM** Dynamic random access memory stores information as charge on a capacitor that must be periodically refreshed. Dedicated DRAM chips form the bulk of the main memory for typical computers, tablets, and smartphones.

- **ECoG** Electrocorticography (ECoG) is a type of electrophysiological monitoring that uses electrodes placed directly on the exposed surface of the brain to record electrical activity from the cerebral cortex.
- **EOT or equivalent oxide thickness** A distance to compare performance of high-k dielectrics with that of SiO<sub>2</sub> film. An SiO<sub>2</sub> film with the thickness of EOT has the same gate capacitance with the high-k material that is used. The higher k dielectrics can reduce EOT, which enhances the MOSFET performance.
- **ESD** Electrostatic discharge. A sudden release of static electricity between two object caused by contact. If the ESD hits the integrated circuit, it may cause the device to fail or reduce the lifetime.
- **FD-SOI** -- Fully depleted silicon on insulator is a process technology option that can offer speed and power advantages over conventional bulk silicon transistors.
- **FinFET** -- A transistor whose 3-D shape resembles a fin, usually with multiple gates surrounding it for better on/off switching control.
- Front-End/FEOL and Back-End/BEOL -- In integrated circuit manufacturing, transistors and other active devices are built first (at the <u>front end of</u> the manufacturing line or FEOL), while the interconnect, or the wiring, is built afterward, at the "<u>back end</u>" of the manufacturing line (BEOL).
- **HEMT** High Electron Mobility Transistor, also known as heterostructure FET (HFET) or modulation-doped FET (MODFET). A HEMT is based on a heterojunction which consists of two semiconductors with different band gaps (see also Compound/III-V Semiconductors). By choosing proper materials, the band discontinuity forms high-mobility two-dimensional electron gas at the hetero interface.
- **Hysteretic control** is a control method for DC-DC converters where a comparator monitors the output voltage and controls the power switch. This method is useful in applications like CPUs and FPGAs where rapid response against load current variation is required.
- HKMG, or High-k Dielectrics/Metal Gates -- A dielectric is an electrical insulator. "k" is the relative permittivity and is a measure of how well a material will prevent current flow between the gate electrode and the channel region of a field-effect transistor, while capacitively coupling the two to control on/off switching. In future CMOS integrated circuits (chips) the gate dielectric will need to provide capacitive coupling equivalent to that of a silicon-dioxide layer that is just a few atoms thick, to allow the length of the channel region to be scaled down to 10 nm and below. Metal gate materials are more compatible with high-k gate dielectrics than are traditional doped polycrystalline silicon material. Much progress has been made in recent years to integrate metal gates into the CMOS process flow for the manufacture of high-performance chips.
- **IEEE 802.11ad** A standard for ultra-high-speed wireless communication which uses millimeter wave (60GHz band)
- III-V -- see Compound/III-V Semiconductors
- Integrated Circuit -- An electrical circuit comprising many interconnected elements (e.g. transistors, diodes, capacitors, resistors, inductors) built on a semiconducting substrate.
- Interconnect -- The metal lines, or wiring, connecting transistors and other circuit elements. See Back-End/BEOL.
- Interposer An electrical interface between chips or between socket and chips. The purpose of an interposer is to connect chips and sockets with different I/O terminals.
- Linear Voltage Regulator Maintain a steady voltage by changing output resistance according to load current. It requires a higher input voltage than output voltage and normally results in lower efficiency than a switching regulator.
- Low-k Dielectrics/Interconnect -- Interconnect refers to the metal wires that connect elements together in an integrated circuit (chip). The close proximity of adjacent wires can result in capacitance that can limit chip performance. A low-k dielectric electrically insulates the copper lines while minimizing their mutual capacitance; however, these materials are generally more fragile and thus pose challenges for manufacturing.
- **Magnetic core** is a piece of magnetic material with a high magnetic permeability used to confine and guide magnetic fields used in devices such as inductors and transformers.

- **MCU** Microcontroller unit. Microcontrollers typically contain a processor core, memory, and input/output peripherals and are designed for embedded applications.
- MEMS -- A micro-electro-mechanical system, containing micrometer-scale moving parts.
- Neural Network A mathematical model aimed at mimicking the characteristics of brain function by computer simulation. It is composed of an input layer, a hidden layer, an output layer and a wiring connecting each unit. Each wire has a parameter called connecting weight. Units of each layer have a function of inputting data multiplied by connecting weight to data propagating from a number of units of the former layer, and outputting results applied to a predetermined function (activation function). A method of applying a test dataset of input-output pairs and finding a suitable set of connecting weights which gives a target function is called supervised learning. In supervised learning, an algorithm called back propagation is generally used. By applying the set of connecting weights obtained by supervised learning, it is possible to obtain a function which gives desired input-output relation.
- **N-FET/P-FET or NMOS/PMOS** -- MOSFETs come in two varieties (n-channel or p-channel) which operate in a complementary fashion.
- Non-volatile memory (NVM) A type of computer memory that retains its stored information even when the power is off.
- **On-Off-Keying (OOK)** A modulation scheme of data communication. Carrier amplitude is directly modulated between one and zero depending on baseband data. Simple envelope detector can be used for demodulation and suited for low power transceiver.
- **PAM4** 4-level pulse amplitude modulation. In communication, the data is represented as one of four discrete levels. This means that each symbol can encode two bits of data instead of the conventional 1 bit/symbol. For the same symbol rate and bandwidth, this doubles the data throughput.
- **Phase-Change Memory/PCM** -- Phase-change materials have crystalline and non-crystalline states which are used to represent the digits "0" or "1" in a non-volatile memory. Electrical current is used to toggle between the two states heat from the current causes the material to change its state.
- Pulse Frequency Modulation (PFM) control is a control method where the pulse frequency is changed, being different from pulse width modulation (PWM) control where the frequency is constant and only the pulse width is changed. In DC-DC converters, this control method can achieve better power conversion efficiency in light load conditions than PWM control.
- **ReRAM or RRAM** Resistive random-access memory. A non-volatile random access memory that stores the binary digit by changing the resistivity of material between electrodes.
- **ROI (Region of Interest)** A ROI is the region which defines the borders of an object under consideration. When capturing the image, individual points of interest can be observed and evaluated.
- SAR ADC A successive approximation ADC is a type of analog-to-digital converter that converts a continuous analog waveform into a discrete digital representation via a binary search through all possible quantization levels before finally converging upon a digital output for each conversion.
- Scaling/Density/Integration -- Scaling is making transistors and other circuit elements smaller so that more of them will fit on a chip. A denser chip contains more transistors in a given area. Integration is combining circuit elements on a chip to add more functions to achieve lower cost per function.
- Semiconductor -- A material that can be made to conduct or to block the passage of electrical current, giving the ability to store and process information.
- **SNDR** Signal-to-noise and distortion ratio is a standard metric for analog-to-digital converter and digitalto-analog converter. SNDR indicates in dB the ratio between the powers of the converted main signal and the sum of the noise and the generated harmonic spurs.
- **SoC** -- A system-on-a-chip. An integrated circuit which integrates all necessary components of a computer or other electronic system on a single chip.
- **SOI** -- A silicon-on-insulator substrate, used to reduce parasitic capacitance and thereby improve integrated circuit performance.
- Strained silicon & SiGe stressors -- Silicon is said to be "strained" when its atoms are pulled farther apart or closer together than normal. Doing so alters the ease with which electrons flow through the silicon, enabling transistors built with it to operate faster and /or at lower voltage. The external stressors which

impart strain are materials with slightly different atomic spacing than silicon. For example, a common way to compressively strain the channel region of a p-channel silicon field-effect transistor is to embed silicon-germanium (**SiGe**), which has larger atomic spacing than does Si, in its source and drain regions.

- **SRAM** -- A type of computer memory (static random access memory) that uses six or more transistors to store each bit of information. It can be written to and read from very quickly.
- **STT-MRAM** Spin torque transfer magnetic random access memory is an emerging type of non-volatile memory that operates according to the "spin" state of electrons, not their electric charge. STT-MRAMs can be made extremely small.
- **TDC, or Time-to-Digital Converter** A device for recognizing events and providing a digital representation of the time they occurred.
- **Ternary content-addressable memory (TCAM)** Content-addressable memory is a specialized memory capable of searching a word in the entire contents. "Ternary" refers to capability of storing and querying "X" don't care, in addition to 0 and 1.
- **TSV** Through silicon vias. TSVs provide a connection from the top to the bottom of a silicon die, allowing vertical interconnections for 3-D stacking of dies.
- **UWB** Ultra-wideband radio is wireless communication that operates in the 3.1-10.6 GHz band using a minimum of 500MHz of bandwidth, typically with very low average radiated power density.
- **Global shutter** Method of capturing entire scene at single instant in time, rather than by scanning across the scene, like rolling shutter.
- Effective Number of Bits (ENOB) Measure of the dynamic performance of ADCs, including noise and distortion effect, normalized to the performance of an otherwise ideal ADC with finite resolution.
- **Transistor** -- A tiny electrical switch that serves as the building block for integrated circuits. It has no moving parts and is made with a semiconductor material, usually silicon. Transistors can be ganged together by the billions on chips and programmed to receive, process and store information, and to output information and/or control signals.