Regular Sessions

Session 1: Design Techniques for Industrial Applications

Session Chair: Jinwook Oh, Rebellion
Session Co-chair: Po-Hung Lin, National Yang Ming Chiao Tung University
Date: Nov. 07, 2022 (Monday)
Time: 10:50 – 12:30 (UTC+8)
Room: Auditorium 國際會議廳, 10F

ID Time Title / Authors / Affiliation
1.1
(7033)
(Highlight)
10:50
     |
11:15
Energy Efficient BNN Accelerator using CiM and a TimeInterleaved Hadamard Digital GRNG in 22nm CMOS
Richard Dorrance, Deepak Dasalukunte, Hechen Wang, Renzhi Liu, Brent Carlton
Intel Corporation, USA

Abstract:
In this paper, we propose a Bayesian Neural Network (BNN) accelerator leveraging a C-2C SRAM-based analog Compute-in-Memory (CiM) macro for the MAC operations and a variable precision (with programable statistical quality), time-interleaved Hadamard Gaussian Random Number Generator (GRNG) for probabilistic weight generation. The proposed BNN prototype achieve a 25% speedup over the state-of-the-art with a 35× improvement in energy efficiency.
1.2
(7176)
(Highlight)
11:15
     |
11:40
Sub-GHz RF Energy Harvester including a Small Loop Antenna
Darshan Shetty1, Christoph Steffan1, Wolfgang Bösch2, Jasmin Grosinger2
1Infineon Technologies AG, Austria
2Graz University of Technology, Austria

Abstract:
This work presents a sub-GHz RF energy harvester comprising an RF-DC converter implemented in a 130 nm CMOS technology, a conjugate matched loop antenna, and an output load. The RF-DC converter uses a novel threshold voltage compensation technique, implemented using an inbuilt nanowatt current reference circuit. The threshold compensation design ensures robust system performance across temperature and process corner variations. Measurements of the RF energy harvester including the antenna reveal an excellent 1 V sensitivity of -33 dBm for an output load of 1 GΩ and a peak PCE of 53%.
1.3
(7022)
(Highlight)
11:40
     |
12:05
An Attachable Fractional Divider Transforming an Integer-N PLL Into a Fractional-N PLL with SSC Capability
Atsushi Motozawa, Yasuyuki Hiraku, Yoshitaka Hirai, Naoaki Hiyama, Yusuke Imanaka, Fukashi Morishita
Renesas Electronics Corporation, Japan

Abstract:
In automotive industry, the system handles with weak satellite signals. Therefore, the output frequency of PLLs is carefully designed to avoid EMI. Recently, GNSS is becoming more common and available frequency bands for clocks are getting narrow. That leads replacement Int-N PLLs with Frac-N PLLs is needed to obtain smaller frequency steps. In this paper, an attachable FDIV is proposed to transform an Int-N PLL into a Frac-N PLL with SSC capability with minimal design effort. A Frac-N PLL with the proposed FDIV achieves -69.3dBc of the worst fractional spur and EMI reduction by 18.7dB in SSC operation.
1.4
(7024)
(Highlight)
12:05
     |
12:30
A Learning-Based Algorithm for Early Floorplan With Flexible Blocks
1JEN-WEI LEE, 1YI-YING LIAO, 1TE-WEI CHEN, 1YU-HSIU LIN, 1CHIA-WEI CHEN, 1CHUN-KU TING, 1SHENG-TAI TSENG, 1RONALD KUO-HUA HO, 1HSIN-CHUAN KUO, 1CHUN-CHIEH WANG, 1MING-FANG TSAI, 1CHUN-CHIH YANG, 1TAI-LAI TUNG, and 2DA-SHAN SHIU
1MediaTek, Taiwan
2MediaTek Research, Taiwan

Abstract:
This paper presents a learning-based algorithm using graph neural network (GNN) and deconvolution network to predict the placement of the locations and the aspect ratios for the design blocks with flexible rectangles. With several hours of training on 4 GPUs, the proposed method, targeting at minimizing the cost of wirelength, can generate the placements in early stage of floorplan which is superior to that from the manual placements which requires several days’ efforts for physical design experts.

Session 2: Switching Mode Power Converters

Session Chair: Makoto Takamiya, University of Tokyo
Session Co-chair: Wanyuan Qu, Zhejiang University
Date: Nov. 07, 2022 (Monday)
Time: 14:00 – 15:40 (UTC+8)
Room: Auditorium 國際會議廳, 10F

ID Time Title / Authors / Affiliation
2.1
(7081)
14:00
     |
14:25
A Single-inductor Triple-output Buck DC-DC Converter with Electromagnetic Gated Low Dropouts for Higher Resistance to Electromagnetic and Power Side-Channel Attacks with 3B Minimum Traces to Disclosure Improvement in Internet of Things Applications
Ya-Ting Hsu1, Yu-Jheng Ouyang1, Ke-Horng Chen1, Kuo-Lin Zheng2, Ying-Hsi Lin3, Shian-Ru Lin3, and Tsung-Yen Tsai3
1National Yang Ming Chiao Tung University, Taiwan
2Chip-GaN Power Semiconductor Corporation, Taiwan
3Realtek Semiconductor Corp, Taiwan

Abstract:
The proposed single-inductor triple-output buck converter with electromagnetic gated low dropouts with the advantage of hiding electromagnetic leaked signature. The proposed intelligent true random number generator reduces the peak EMI noise from 88.4dBμV to 54.9dBμV at the fundamental frequency, unobvious tones in fast Fourier transform. Reduction of 33.5dBμV can be derived, improving the minimum traces to disclosure to about 3B.
2.2
(7046)
14:25
     |
14:50
An One-Cycle Load Transient Response and 0.81 mV/A Load-Regulation Time-Domain Cascaded-VCOControlled Buck Converter for Powering Gaming SoC
Chieh-Ju Tsai1, I-Fang Lo2, Tsung-Hsien Lin1, Ching-Jan Chen1
1National Taiwan University, Taiwan
2Richtek Technology Corporation, Taiwan

Abstract:
A time-domain cascaded-VCO-controlled buck converter with low-cost output LC filter for gaming SoC application is proposed. By separating the modulation and frequency stabilization functions, the KVCO mismatch issue of conventional time-based PWM controller is no longer exists. The steady-state FSW error less than ±0.81% is measured. The proposed controller achieves 0.81mV/A load regulation, 1-cycle load transient settling (1μs), and at least 2X FoM improvement over prior arts.
2.3
(7038)
14:50
     |
15:15
A 90.6% Peak-Efficiency 1.5A Dual Inductor Ladder BuckConverter Achieving 0.93W/mm2 Active Peak Power Density for Li-ion Battery Operated PMICs
Arindam Mishra, Wei Zhu, and Valentijn De Smedt
ESAT, ADVISE, KU Leuven, Belgium

Abstract:
A dual-inductor-ladder (DIL) DC-DC converter is presented to provide 0.3-1V output down conversion directly from a 2.5-5V Li-ion battery for low-voltage System-on-Chips (SoCs). Inherent inductor current and capacitor voltage balancing, complete capacitive soft-charging, and reduced inductor current facilitate the converter to achieve very high active and passive power-density, and efficiency even for compact-volume inductors. The DIL is fabricated in a 65nm CMOS technology obtaining 90.6% peak efficiency, 0.93W/mm2 active peak power density, and a maximum 1.5A load current support occupying just over 1mm2 die area.
2.4
(7108)
15:15
     |
15:40
A 96.62%-Peak-Efficiency and Seamless-Mode-Transition Buck-Boost DC-DC Converter with Auto-Shift-Ramp
Chi-Wei Chen, Bao-Xian Peng, and Hsin-Shu Chen
National Taiwan University, Taiwan

Abstract:
This paper proposes an Auto-Shift-Ramp (ASR) technique, which can significantly alleviate the undershoot or overshoot voltage caused by the mode transition in the multi-mode DC-DC converters. The proposed ASR shifts the starting time of the ramp voltage and empowers the DC-DC converter to change the duty instantly after the mode changes without limiting the maximum duty or changing the modulator gain. According to the measurement results, the mode transition overshoot voltage is less than 16mV or 0.48% with less than 18.98μsec settling time. The converter achieves 96.62%-peak-efficiency at 50mA load current in buck mode. Compared to the prior works, the proposed DC-DC converter with ASR achieves a much lower mode transition voltage than prior works, even with smaller output capacitance.

Session 3: Novel Neural Network and Crypto Processors

Session Chair: Kun-Chih Chen, National Sun Yat-Sen University
Session Co-chair: Leibo Liu, Tsinghua University
Date: Nov. 07, 2022 (Monday)
Time: 14:00 – 15:40 (UTC+8)
Room: Song Bo 松柏廳, 10F

ID Time Title / Authors / Affiliation
3.1
(7065)
(Highlight)
14:00
     |
14:25
SNPU: Always-on 63.2µW Face Recognition Spike Domain Convolutional Neural Network Processor with Spike Train Decomposition and Shift-and-Accumulation Unit
Sangyeob Kim, Sangjin Kim, Soyeon Um, Soyeon Kim, Juhyoung Lee and Hoi-Jun Yoo
Korea Advanced Institute of Science and Technology, Korea

Abstract:
The proposed SNPU has 3 key features. First, Spike Train Decomposition reduces the accumulations (ACCs) by 71.8%. Second, Time Shrinking Multi-Level Encoding replaces the multiple ACCs with single Shift-and-Accumulation (SAC), and SAC unit adopts bit scalability to enable different always-on applications. Third, Neuron Link supports various time-windows to optimize energy consumption by minimizing time-window in layer-by-layer and increases the PE utilization by 14.06% for FR. For LFW dataset, the proposed processing can reduce the energy consumption by 43.9% due to neuron-level event-driven operation. If there is no face in the input, the energy can be reduced further by 87.6%.
3.2
(7042)
(Highlight)
14:25
     |
14:50
A 28nm 57.6TOPS/W Attention-based NN Processor with Correlative Computing-in-Memory Ring and Dataflowreshaped Digital-assisted Computing-in-Memory Array
Ruiqi Guo1, Zhiheng Yue1, Hao Li1, Te Hu1, Yabing Wang1, Hao Sun1, Jeng-Long Hsu2, Yaojun Zhang3, Bonan Yan4, Leibo Liu1, Ru Huang4, Shaojun Wei1, Shouyi Yin1
1Tsinghua University, China
2NeoNexus Pte. Ltd., Singapore
3Pimchip Technology Co., Ltd., China
4Peking University, China

Abstract:
This paper presents a 28nm 7.10mm2 CIM-based transformer processor, achieving 23.81-to-57.6 TOPS/W system energy efficiency. This paper proposes three key design features in the chip: 1) A correlative CIM ring to avoid it to load dynamically generated matrices. 2) A softmax-based speculate unit to eliminate redundant attention computing. 3) A dataflow-reshaped digital-assisted CIM-array to achieve fully pipelined computations of the final attention result. The chip can work at 0.56-to-0.9V, 151-to-202MHz. The chip consumes average power of 57.97mW at 202MHz and 0.9V.
3.3
(7170)
14:50
     |
15:15
A 65nm 8-bit All-Digital Stochastic-Compute-In-Memory Deep Learning Processor
Jiyue Yang, Tianmu Li, Wojciech Romaszkan, Puneet Gupta, and Sudhakar Pamarti
University of California, Los Angeles, USA

Abstract:
This work presents the first ADC/DAC-free compute-in-memory accelerator based on Stochastic Computing (SC). A Stochastic-Compute-in-Memory Accelerator (SCIMA) is presented that (1) embeds SC MAC logic inside an SRAM that only requires 1-bit decisions and no DACs/ADCs, (2) reduces SC number generation costs significantly, and 3) employs a computation skipping technique for SC’s average pooling function that reduces the total latency and energy by 4x. The Measured 65nm chip achieves 7.96 TOPS/W energy efficiency for the whole system and 20 TOPS/W for the macro. The solution provides 6x better CIM macro density and 2.5x better peak system energy efficiency of 8-bit precision and network classification accuracy comparable to fixed-point implementations.
3.4
(7188)
15:15
     |
15:40
High-speed and energy-efficient crypto-processor for post-quantum cryptography CRYSTALS-Kyber
Taishin Shimada, Makoto Ikeda
The University of Tokyo, Japan

Abstract:
This paper presents the design and measurement results of an ASIC for high-speed, low-power key exchange using CRYSTALS-Kyber, a type of post-quantum cryptography(PQC). The design focuses on a large number of number-theoretic transformations (NTT) in Crystals-Kyber and employs a pipelined architecture to perform the processing. As a result. Our chip performs up to 8.5 times faster than a CPU and consumes 24.1 times less energy than a CPU.

Session 4: RF Transceiver Techniques

Session Chair: Chien-Nan Kuo, National Yang Ming Chiao Tung University
Session Co-chair: Baoyong Chi, Tsinghua University
Date: Nov. 07, 2022 (Monday)
Time: 14:00 – 15:40 (UTC+8)
Room: Chang Chin 長青廳, 10F

ID Time Title / Authors / Affiliation
4.1
(7023)
(Highlight)
14:00
     |
14:25
A 110-120-GHz, 12.2% Efficiency, 16.2-dBm Output Power Multiplying Outphasing Transmitter in 22-nm FDSOI
Jeff Shih-Chieh Chien, James F. Buckwalter
University of California, Santa Barbara, USA

Abstract:
A multiplying outphasing transmitter based on reflection-type phase shifter and multiplier chain is fabricated in Global Foundries 22nm FDSOI CMOS process and the measured transmitter performance achieves 9.2-12.2% DC-to-RF efficiency with 15.1-16.2dBm output power at 110-120 GHz.
4.2
(7027)
14:25
     |
14:50
A D-Band Packaged CMOS Integrated Transmitter for MUMIMO Applications
Meng Wei1, Nima Baniasadi1, Ethan Chou1, Hesham Beshary1, Sashank Krishnamurthy2, Elad Alon1, Ali Niknejad1
1University of California, Berkeley, USA
2Intel, USA

Abstract:
This paper presents a D-band packaged CMOS integrated transmitter (TX) for Multi-User Multiple-Input-Multiple Output (MIMO) applications. The TX chip, fabricated using 28nm CMOS Bulk process, is packaged on an organic interposer including a patch antenna array. The circuit integrates the complete transmitter chain, including the baseband I/Q amplifiers, up-conversion mixers, power amplifier, and the LO distribution and generation. The designed TX achieves 9-10.6dBm EIRP at Psat , and it can support 24 Gbps 16-QAM and 24Gbps 64-QAM at 5.3pJ/bit efficiency, tested with over-the-air measurements.
4.3
(7050)
(Highlight)
14:50
     |
15:15
A Dual-Band 2×2 802.11ax Transceiver Supporting 160MHz CBW and 1024-QAM
Chao Lu1, Shr-Lung (Calvin) Chen2, Jun Liu3, Jian Bao3, Yi Zhao3, Chin-Ming Chien2, Yufei Wang1, Jianqiu Chen3, Zexin Liao3, BingDing3, Bihui Zhu3, Jinhua Chen3, Pengfei Yue3, Ran Wang3, and Chun Wang3
1ASR Microelectronics Inc., USA
2ASR Microelectronics Inc., USA
3ASR Microelectronics Ltd., China

Abstract:
A 2×2 802.11ax transceiver design is presented to support dual band simultaneous operation (DBS) and 1024-QAM modulation. The proposed architecture features linearity enhancement for uplink OFDMA and wideband transmission. Best-in-class receiving sensitivity and lowest transmission EVM floor are demonstrated in measurements. With 20MHz (HE20) receiving, -96.5dBm/-66dBm sensitivity level is measured for MSC0/11, respectively. The output power reaches 18dBm with -35dB EVM for 80MHz 1024-QAM (HE80 MCS11) transmission at 5GHz band. Narrowband OFDMA signals can be transmitted at full power capacity, and 160MHz channel bandwidth (CBW) can also be supported without digital predistortion (DPD). The fully integrated transceiver occupies 10.5mm^2 silicon area in 22nm CMOS.
4.4
(7141)
15:15
     |
15:40
A 32.2-38.2 GHz Broadband 4-Channel TRx Beamformer with Embedded 3-Winding Transformer Based PA/LNA FE and High Resolution Phase/Amplitude Control
Yongjie Li1, Zongming Duan1, Xiao Li1, Chuanming Zhu1, Na Ding1, Yuefei Dai1, Liguo Sun2, Hao Gao3
1East China Research Institute of Electronic Engineering, China
2University of Science and Technology of China, China
3Eindhoven University of Technology, the Netherlands

Abstract:
This paper presents a 32.2-38.2 GHz broadband 4-channel Ka-band transceiver beamformer. In this transceiver (TRx) beamformer front-end (FE), a compact 3-winding-transformer achieves the Tx power combing and Rx noise matching simultaneously in the TDD mode. Furthermore, this 4-channel RF beamformer integrates a high precision 6-bit 360° phase shifter and 6-bit 0.5-dB step gain control in each channel for beam scanning accuracy improvement. With programmable 6-bit phase and 6-bit gain control, at 38 GHz, the measured 31.5-dB gain turning range is also with a 0.5-dB gain step and 5.6° phase step. With the TRx architecture, at 38 GHz, the measured Psat of Tx is 20.0-dBm, and the NF of Rx is 5.55-dB.

Session 5: Biomedical Sensing Chips and Systems

Session Chair: Philex Ming-Yan Fan, National Cheng Kung University
Session Co-chair: Bo Zhao, Zhejiang University
Date: Nov. 07, 2022 (Monday)
Time: 14:00 – 15:40 (UTC+8)
Room: V110 十全軒, VF

ID Time Title / Authors / Affiliation
5.1
(7198)
(Highlight)
14:00
     |
14:25
A Synchronous-Sampling Impedance-Readout IC with Baseline-Cancellation-Based Two-Step Conversion for Fast Neural Electrical Impedance Tomography
Ji-Hoon Suh1, Haidam Choi1, Yoontae Jung1, Sein Oh1, Hyungjoo Cho1, Nahmil Koo2, Seong Joong Kim2, Chisung Bae2, Sohmyung Ha3, and Minkyu Je1
1KAIST, Korea
2Samsung Advanced Institute of Technology, Korea
3New York University Abu Dhabi, United Arab Emirates

Abstract:
It was recently shown that electrical impedance tomography (EIT) with far enhanced frame rate can provide neural activity monitoring and functional localization of the active peripheral nerve at the same time. For the \'fast neural EIT\', we propose an EIT system employing successive-approximation-based (SA-based) baseline tracking and synchronous sampling (SS) of the ADC. By utilizing SA, the baseline can be tracked much faster than conventional incremental tracking. By using SS, only a single cycle of CG is required, enabling fast demodulation and thus allowing the use of low CG frequency. Thanks to these, even with the CG frequency of 18kHz, which is low enough to secure SNR for neural EIT, our work achieves maximum 500 fps which is about 4x higher than the state-of-the-art.
5.2
(7120)
14:25
     |
14:50
A 1984-Pixels, 1.26nW/Pixel Retinal Prosthesis Chip with Time-Domain In-Pixel Image Processing
Dong-Hwi Choi and Dong-Woo Jee
Ajou University, Korea

Abstract:
This paper presents 1984-pixel retinal prosthesis (RP) chip with in-pixel image processing. The proposed time-domain image processing circuits perform edge extraction by comparing the pulse widths generated by light-to-stimulus duration converters (LSDCs) of neighboring sensors. The pixel sequencing technique for the shared electrode operation is also proposed to increase the pixel count under the given chip area. The RP chip is implemented in 0.18 μm CMOS process and consumes 1.26 nW/pixel which is ×44.7 better than the previous state-of-the-art
5.3
(7074)
14:50
     |
15:15
A 64-channel back-gate adapted ultra-low-voltage spikeaware neural recording front-end with on-chip lossless/near-lossless compression engine and 3.3V stimulator in 22nm FDSOI
Franz Marcus Schüffny, Seyed Mohammad Ali Zeinolabedin, Richard George, Liyuan Guo, Annika Weiße, Johannes Uhlig, Julian Meyer, Andreas Dixius, Stefan Hänzsche, Marc Berthel, Stefan Scholze, Sebastian Höppner, Christian Mayr
TU Dresden, Germany

Abstract:
In neural implants and biohybrid research systems, the integration of electrode recording and stimulation front-ends with pre-processing circuitry promises a drastic increase in real-time capabilities. In our proposed neural recording system, constant sampling with a bandwidth of 9.8kHz yields 6.73µV input-referred noise (IRN) at a power-per-channel of 0.34µW for the time-continuous ΔΣ-modulator, and 0.52µW for the digital filters and spike detectors. We introduce dynamic current/bandwidth selection at the ΔΣ and digital filter to reduce recording bandwidth at the absence of spikes. This is controlled by a two-level spike detection and adjusted by adaptive threshold estimation (ATE). Dynamic bandwidth selection reduces power by 53.7%, increasing the available channel count at a low heat dissipation. Adaptive back-gate voltage tuning (ABGVT) compensates for PVT variation in subthreshold circuits. This allows 1.8V input/output (IO) devices to operate at 0.4V supply voltage robustly. The proposed 64-channel neural recording system moreover includes a 16-channel adaptive compression engine (ACE) and an 8-channel on-chip current stimulator at 3.3V.
5.4
(7123)
15:15
     |
15:40
A Heart-related Physiological Signal Monitoring SoC for Wearable ECG Analysis Systems
Peng-Wei Huang 1, Shuenn-Yuh Lee1, Chieh Tsou1, Yi-Wen Hung1, Po-Han Su1, Ju-Yi Chen2
1National Cheng Kung University, Taiwan
2National Cheng Kung University Hospital, Taiwan

Abstract:
This proposed configurable electrocardiogram (ECG) analysis system-on-chip (CEASoC) allows ECG monitoring and complex QRS detection and classification, thereby reducing the manpower requirements of the analysis. ECG analyses conducted by a person are effort- and time-consuming. Thus, an automatic ECG analysis device with a CEASoC and BLE module is necessary. This device can improve the healthcare environment through the convenience of instant detection. The burden of long-term care can then be relieved. Moreover, considering individual differences, the important analysis parameters in CEASoC can be updated using external devices and software to enhance the flexibility of the proposed system.

Session 6: High-Speed and Time-Interleaved ADCs

Session Chair: Hsin-Shu Chen, National Taiwan University
Session Co-chair: Yong Lim, Samsung Electrnics
Date: Nov. 08, 2022 (Tuesday)
Time: 10:50 – 12:30 (UTC+8)
Room: Auditorium 國際會議廳, 10F

ID Time Title / Authors / Affiliation
6.1
(7053)
(Highlight)
10:50
     |
11:15
A Single-Channel 14b 500 MS/s Pipelined-SAR ADC with Reference Ripple Mitigation Techniques and AdaptiveBiased Floating Inverter Amplifier
1,2Wenning Jiang, 1Yan Zhu, 1Chi-hang Chan, and 1,3Rui Martins
1University of Macau, China
2Fudan University, Shanghai, China
3Universidade de Lisboa, Portugal

Abstract:
This paper presents a 14b 500MS/s single-channel pipelined-SAR ADC. An on-chip reference buffer is codesigned with reference ripple neutralization (RRN) and cancellation (RRC) in the first stage to facilitate a fast conversion at low power. An adaptive-biased floating inverter amplifier (AB-FIA) is introduced to enhance the gain, linearity and speed. Consuming 6.34mW (included reference buffer), the achieved SNDR and SFDR are 64.2dB and 80.55dB at Nyquist input, respectively. The ADC achieves 170.2dB Schreier FoM and 9.6 fJ/conversion-step Walden FoM at Nyquist input.
6.2
(7190)
11:15
     |
11:40
A 3.07mW 30MHz-BW 73.5dB-SNDR Time-Interleaved Noise-Shaping SAR ADC with 2nd -order ErrorFeedforward and Redundancy-Bit Reduction
Shulin Zhao1, Mingqiang Guo1, Sai-Weng Sin1,2, Liang Qi3, Dengke Xu4, Guoxing Wang3, Rui P. Martins1,5
1University of Macau, China
2Zhuhai UM Science & Technology Research Institute, China
3Shanghai Jiao Tong University, China
4Amicro Semiconductor Co., Ltd, China
5University of Lisboa, Portugal

Abstract:
This work presents a calibration-free 2-channel time-interleaved noise-shaping SAR (TI-NS-SAR) with 1) one-time midway error-FB and a shared dynamic amplifier to reduce the redundancy bit; 2) the 2nd-order error-feedforward to enhance NS effect for higher resolution. Fabricated in 28nm CMOS, the prototype achieves 73.5dB-SNDR and 30MHz-BW with a sampling frequency of 330MHz. It consumes 3.07mW, resulting in an FoMs of 173.4dB.
6.3
(7095)
11:40
     |
12:05
A 12b 8GS/s Time-Interleaved 2b/cycle Pipelined-SAR ADC with Layout-Customized Bootstrap and SuperSource-Follower Based Open-Loop Residue Amplifier
Qiang Yu1,2, Jie Pu1, Jian Luo1, Zhengbo Huang1, Junhong Wu1, Xing Zhu1, Feixiang Xiang1, Lei Chen1, Jianwen Li1, Qiang Li2, Jinda Yang1, and Yuanjun Cen1
1Chengdu Sino Microelectronics Technology, China
2University of Electronic Science and Technology of China, China

Abstract:
This work describes a 12b 8GS/s time-interleaved ADC which utilizes a 2b/cycle pipelined-SAR ADC in each channel to enhance the speed while maintaining low power. To sample the input signal within 125ps, a layout-customized bootstrap is proposed to accelerate the start-up time. A high-linearity super-source-follower (SSF) based open-loop residue amplifier (RA) with large input swing and strong output power is exploited. With Nyquist input, this 8GS/s ADC achieves a SNDR of 53.8dB and a SFDR of 67dB with a power dissipation of 1W.
6.4
(7225)
12:05
     |
12:17
A 6-bit 5.12-GS/s Flash ADC with Track-and-Hold Embedded Dynamic Preamplifier in 28nm CMOS
Daesik Moon1,2, Sangwoo Lee3, Taewoong Kim1, Woo-Young Choi1, and Youngcheol Chae1
1Yonsei University, Korea 2Samsung Electronics, Korea 3Robert Bosch LLC., USA

Abstract:
5.12gs/s flash adc with track-and-hold embedded dynamic preamplifier. x4 interpolated pipelined amplifier followed by strong-arm latch. above 32.96db over different input frequencies and sampling frequencies. foreground calibration is realized.
6.5
(7213)
12:17
     |
12:30
A 7-Bit 4-GS/s Quad-Channel Time-Interleaved SAR ADC With 2-Then-1-Bit/Cycle Conversion
Jihyun Baek, Jonghyun Kim, Gyuchan Cho, Jintae Kim, and Hyungil Chae
Konkuk University, Korea

Abstract:
A 7-bit 4-GS/s quad-channel TI-SAR ADC including the front-end sampler and the buffer is presented. The channel ADC speed is maximized by 2-then-1-bit/cycle coarse-fine conversion without calibration. Also, a buffer topology for unity gain is introduced. The prototype is implemented in a 28-nm CMOS process, and it shows an SNDR of 38 dB at a 4 GS/s sampling rate. The power consumption is 11.4 mW, and the Walden FoM is 43.8 fJ/conv.-s showing good energy efficiency.

Session 7: Emerging Computing Applications on FPGA

Session Chair: Chuen-Yau Chen, National University of Kaohsiung
Session Co-chair: Youngjoo Lee, POSTECH
Date: Nov. 08, 2022 (Tuesday)
Time: 10:50 – 12:30 (UTC+8)
Room: Song Bo 松柏廳, 10F

ID Time Title / Authors / Affiliation
7.1
(7254)
(Highlight)
10:50
     |
11:15
A 75.6M Base-pairs/s FPGA Accelerator for FM-index Based Paired-end Short-Read Mapping
Chung-Hsuan Yang1, Yi-Chung Wu1, Yen-Lung Chen1, Chao-Hsi Lee2, Jui-Hung Hung2,3, Chia-Hsiang Yang1,2
1National Taiwan University, Taiwan
2GeneASIC Technologies Corp., Taiwan
3National Yang Ming Chiao Tung University, Taiwan

Abstract:
This work presents an FPGA accelerator for FM-index based paired-end short-read mapping in NGS data analysis realized on a AMD-Xilinx Alveo U250 FPGA board. With the proposed design techniques, the overall latency is reduced by 92.6%. This work delivers a 1.7-18.6x higher throughput with memory-efficient implementation and achieves the highest 99.3% accuracy, when compared to the state-of-the-art FPGA-based designs. On-site FPGA demonstration will be made.
7.2
(7152)
11:15
     |
11:40
A 217.8 MSOPs/W FPGA-based Online Learning SNN Processor Using Unified Event-Driven Structure and Topology Aware Data Reuse Strategies
Chaoming Fang1,2, Fengshi Tian2, Chuanqing Wang2, Jie Yang2, Mohamad Sawan2
1Zhejiang University, China
2CenBRAIN Neurotech, Westlake University, China

Abstract:
We present in this paper a reconfigurable algorithmic neuromorphic engine (RAINE) with three innovative features: 1) A Pipelined-Event-Driven (PED) architecture to increase SNN execution efficiency by leveraging input sparsity. 2) A Topology-Adaptive-Stationary (TAS) data reuse strategy to reduce memory access by adopting Voltage-Reuse (VR), Event-Reuse (ES), and Synapse-Reuse (SR) dataflow for different topologies and 3) A Unified-Dynamic-Learning-Engine (UDLE) to carry out computation for both Leaky-Integrate-Fire (LIF) and trace-based Spike-Timing-Dependent-Plasticity (STDP) online learning. RAINE shows competitive energy efficiency of 217.8 MSOPS/W at a clock frequency of 75MHz, without causing additional hardware resource overhead due to the compact and unified circuit design.
7.3
(7187)
11:40
     |
12:05
A Flexible Instruction-based Post-quantum Cryptographic Processor with Modulus Reconfigurable Arithmetic Unit for Module LWR&E
Aobo Li, Dongsheng Liu, Xiang Li, Tianze Huang, Shuo Yang, Jiahao Lu, Ang Hu
Huazhong University of Science and Technology, China

Abstract:
In this work, we proposed a reconfigurable arithmetic unit with variable modulus domain, and combined with custom instruction-set architecture to design a flexible crypto processor for MLWR and MLWE. Verified on the FPGA platform, the work achieved the flexible implementation of variable parameters and instruction programming under the strategy of resource efficiency and performance trade-off.
7.4
(7100)
12:05
     |
12:30
Method of Halved Interaction Elements with Regularity Arrangement that achieves Independent Double Systems for Scalable Fully Coupled Annealing Processing
Shinjiro Kitahara, Akari Endo, Taichi Megumi, and Takayuki Kawahara
Tokyo University of Science, Katsushika, Japan

Abstract:
In recent years, annealing processors have been developed as solutions to large-scale combinatorial optimization problems. In this paper, we propose a new method that has a high affinity with a scalable fully coupled annealing processor and halves the interaction in which there are squares of spins with sequence regularity. In addition, we succeeded in implementing two independent 384-spin fully-coupled Ising machines with 16 chips. The usefulness of the reduction plan is shown.

Session 8: High Performance Receiver and Detection Techniques

Session Chair: Kuang-Wei Cheng, National Cheng Kung University
Session Co-chair: Dixian Zhao, Southeast University
Date: Nov. 08, 2022 (Tuesday)
Time: 10:50 – 12:30 (UTC+8)
Room: Chang Chin 長青廳, 10F

ID Time Title / Authors / Affiliation
8.1
(7214)
(Highlight)
10:50
     |
11:15
A 37-39GHz Phase and Amplitude Detection Circuit with 0.060 degree and 0.043dB RMS Errors for the Calibration of 5GNR Phased-Array Beamforming
Yudai Yamazaki, Jun Sakamaki, Jian Pang, Joshua Alvin, Zheng Li, Atsushi Shirane, Kenichi Okada
Tokyo Institute of Technology, Japan

Abstract:
Phased-array beamforming is achieved by the high-resolution phase and amplitude controls in each TRX element. However, the on-chip mismatches caused by PVT variations between each element degrades the phased-array performance. In this work, a phase and amplitude high-accuracy detection circuit for phased-array mismatch calibration in 39GHz bands is introduced. Phase-to-digital converter (PDC) and analog-to-digital converter (ADC) detection technique is applied for much lower detection errors than conventional. The proposed detection circuit achieves phase and amplitude detection in 37-39GHz with 0.046 degree and 0.043dB RMS errors, respectively. The core area is 1.34mm^2, which is fabricated in a 65nm CMOS process.
8.2
(7074)
11:15
     |
11:40
A 0.55mm2 16.9mW Fully Integrated 0-to-200MHz System BW Wireless Direct Sampling Receiver in 14nm FinFET
Ilhoon Jang, Barosaim Sung, Jaehoon Lee, Soonwoo Choi, Byoungjoong Kang, Suseob Ahn, Kyungmin Lee, Taejin Jang, Kwangmin Lim, Anna Yu, Yong Lim, Seunghyun Oh, and Jongwoo Lee
Samsung Electronics, Korea

Abstract:
This paper presents a fully-integrated wireless direct sampling receiver that covers from DC to 200MHz system bandwidth implemented with a single-channel SAR ADC in 14nm FinFET. To demonstrate the proposed architecture, frequency modulation (FM) among the applicable standard frequency bands is adopted as a prototype. The measured demodulated SNR is 73.9dB with -47dBm input power at 108MHz and the sensitivity level is -106dBm. The proposed direct sampling receiver shows a robust performance over a 30dB demodulated SNR even in the presence of the interference such as a strong adjacent channel and an in-band spur. Furthermore, the FM channel scan time is drastically reduced since the proposed receiver simultaneously samples all channels without adjusting analog building blocks.
8.3
(7124)
11:40
     |
12:05
An n79 Sub-1-dB Noise Figure Highly Linear VariableGain LNA Employing Adaptive Imbalanced Bleeding for 5G NR
Jinglong Xu1, Keun-Mok Kim1, Hafiz Usman Mahmood1, Jusung Kim2, Sang-Gug Lee1
1KAIST, Korea
2Hanbat National University, Korea

Abstract:
This work presents a 5G n79 sub-1-dB NF highly-linear variable-gain LNA. Three key techniques are introduced: (i) Imbalanced current bleeding for a wide gain range, (ii) Drain-side DC current switching for low power operation (iii) bleeding with an adaptive biasing scheme for linearity improvement. The proposed LNA shows a peak gain of 20.5 dB with a 0.74 dB minimum NF, with a wide gain range of 13.4 dB while reducing the power to 4.2 mW at the lowest gain mode. As a result, the proposed LNA achieves the best FoM1 among reported LNAs working at 4-6 GHz.
8.4
(7232)
12:05
     |
12:30
A 24GHz CMOS UWB Radar IC with IQ Correlation Receiver for Short Range Human Detection
Dongwuk Park1,2, Byeongjae Seo1, Kiryun Byeon1, Gu Jung2, andYunseong Eo1,2
1Kwangwoon University, Korea
2Silicon R&D, Corp., Korea

Abstract:
A fully integrated 24 GHz UWB radar IC is presented. The IQ correlation receiver is employed for the detection fidelity and range extension. The transmitter is a VCO based impulse generator. The carrier frequency and bandwidth of the UWB signal can be tunable in the range of 22.9 - 25.5 GHz and 0.18 - 3 GHz, respectively. The equivalent sample resolution is 195 ps. The radar module using IC provides the maximum detection range for moving human up to 12.5 m within 120.27 mW power consumption.

Session 9: Energy-Efficient Digital Circuit Techniques

Session Chair: Amit Agarwal, Intel
Session Co-chair: Chia-Hsiang Yang, National Taiwan University
Date: Nov. 08, 2022 (Tuesday)
Time: 10:50 – 12:30 (UTC+8)
Room: V110 十全軒, VF

ID Time Title / Authors / Affiliation
9.1
(7154)
(Highlight)
10:50
     |
11:15
DSC-TRCP: Dynamically Self-calibrating Tunable Replica Critical Paths Timing Monitoring for Variation Resilient Circuits with Low Cost & Large Power/Frequency Gain
Zhengguo Shen, Weiwei Shan*, Yuxuan Du, Ziyu Li, Chengjun Wu, Jun Yang
Southeast University, China

Abstract:
In-situ timing monitoring based adaptive voltage scaling (AVS) eliminates the excess timing margin for digital circuits but suffers from miss detection risk. Indirect monitoring methods face difficulties in the calibration of the replica circuit and its discrepancy with the actual circuit which limits its gain. We propose a dynamically self-calibrating tunable replica critical paths (DSC-TRCP) based timing monitoring method, which integrates the advantages of both in-situ and indirect monitoring methods while conquering their disadvantages. Implemented in a 28nm CMOS technology, it achieves up to 58% power gain or 232% frequency gain with only 0.65% area cost.
9.2
(7208)
11:15
     |
11:40
C3MLS: A 0.12-nW Leakage and 18.11-fJ/Transition Level Shifter With Cross-Coupled and Current Mirror Hybrid Structure for Ultra-Wide Range Level Conversions
Cong Huang and Hailong Jiao*
Peking University, China

Abstract:
In this paper, a CCLS (cross-coupled level shifter)/CMLS (current mirror level shifter) hybrid level shifter, C3MLS, is proposed for ultra-wide range level conversions from extremely low voltage deep in the subthreshold region to nominal supply voltage. By maintaining the merits of CCLS and CMLS and utilizing them to kill the drawbacks of each other, the proposed C3MLS achieves limited-current-contention and nearly static-current-free conversions. Measurement results in 55-nm technology demonstrate that the proposed level shifter exhibits the lowest energy-delay product among the state of the art and an average static power consumption of 0.12 nW @ VDDL = 0.3 V.
9.3
(7117)
11:40
     |
12:05
A 0.0043-mm2 Capacitorless External-Clock-Free FullySynthesizable Digital LDO Using Load-Direct Droop Detector and Time-Based Load-State Decision
Jonghyun Oh1, Yoonho Song2, Young-Ha Hwang3, Jun-Eun Park4, Mingoo Seok1, and Deog-Kyoon Jeong2
1Columbia University, USA
2Seoul National University, Korea
3Soongsil University, Korea
4Chungnam National University, Korea

Abstract:
The proposed fully-synthesizable DLDO determines a load state using a single CMP, a single voltage reference, and a tunable delay line without an external clock, resulting in having an 99.6% current efficiency in a 0.6-V supply voltage. Besides, a 5-ns settling time from a 98-mV voltage droop is achieved using a coarse controller and a load-direct droop detector. The DLDO offers a 0.0043-mm2 chip area and 13.01-A/mm2 current density thanks to the fully-synthesized capacitorless design. The DLDO exhibits the best FoM2 compared with prior arts that includes a performance for settling time.
9.4
(7168)
12:05
     |
12:17
A 10-Gbps, 0.121-pJ/bit, All-Digital True Random-Number Generator using Middle Square Method
Jonghyun Kim and Hyungil Chae
Konkuk University, Korea

Abstract:
A robust and all-digital true random number generator (TRNG) with high throughput and good power efficiency is presented. A modified middle square method for post- processing converts a 1-bit comparator output to an 8-bit random stream to achieve 10Gbps throughput. The proposed TRNG achieves the highest throughput as well as the best power efficiency of 0.121pJ/bit among all NIST test-suite adaptable TRNGs.
9.5
(7157)
12:17
     |
12:30
A Variation-Tolerant Differential Contention-Free Pulsed Latch with Wide Voltage Scalability
Gicheol Shin, Minhyeok Jeong, Donguk Seo, Shin Han, Yoonmyung Lee
Sungkyunkwan University, Korea

Abstract:
A differential contention-free pulsed latch (DCPL) is proposed, targeting wide voltage range scalability (1V to 0.4V). In order to operate in near threshold-voltage (NTV) region, differential latch structure is combined with dynamic XOR while staying static and contention-free, using special header/bridge structure. Also, in order to decrease the number of transistors and power consumption, pulse generator is absorbed into D-latch using blockages controlled by delayed clock and dual bridge structure. The proposed DCPL operates as reliably as TGFF at NTV region, and shows 50% improvement in sequencing time compared to TGFF, while maintaining similar hold time compared to prior-arts pulsed latches.

Session 10: Analog Techniques

Session Chair: Tetsuya Hirose, Osaka University
Session Co-chair: Mustafijur Rahman, Indian Institute of Technology Delhi
Date: Nov. 08, 2022 (Tuesday)
Time: 14:00 – 15:40 (UTC+8)
Room: Auditorium 國際會議廳, 10F

ID Time Title / Authors / Affiliation
10.1
(7132)
(Highlight)
14:00
     |
14:25
A Process-Scalable Ultra-Low-Voltage 180kHz Sleep Timer with a Time-Domain Amplifier and a Switch-less Resistance Multiplier
Chongsoo Jung1, Hoyong Seong1, Injun Choi1, Sohmyung Ha2, and Minkyu Je1
1KAIST, Korea
2New York University Abu Dhabi, United Arab Emirates

Abstract:
This paper presents a process-scalable on-chip sleep timer. Our sleep timer overcomes the limitations of conventional on-chip sleep timers by using a combination of ultra-low-voltage (ULV) frequency-locked-loop (FLL) architecture, and a time-domain amplifier (TDA), and a gate-leakage-leveraging technique. The proposed design, fabricated in a 65nm CMOS, produces a 180kHz frequency and achieves 2.73ppm/°C temperature dependency with calibration based on a lookup table (LUT) while consuming 61nW at 0.4V supply.
10.2
(7068)
14:25
     |
14:50
A sub-0.5V Crystal Oscillator-Timer (XO-Timer) Combining 16MHz Reference and 32kHz Sleep Timer with a Single Crystal for Energy-Harvesting Radios in 28nm CMOS
Liwen Lin1, Ka-Meng Lei1, Pui-In Mak1, Rui P.Martins1,2
1University of Macau, China
2Universidade de Lisboa, Portugal

Abstract:
This paper reports an ultra-low-voltage (ULV) single-crystal oscillator-timer (XO-Timer) for sub-0.5 V BLE radios. Specifically, we propose a cascaded charge-pump (CP) as the micropower manager (μPM) to customize the voltage and current budgets for each XO-Timer sub-function. Such μPM shows a higher power efficiency than the non-cascaded design and features a single voltage-regulation loop to uphold the performance of the XO-Timer against VT-variations. The XO-Timer\'s core amplifier innovates an ULV reconfigurable-gm topology to balance the power budget and performance under the high-performance mode (HPM) and low-power mode (LPM). Fabricated in 28-nm CMOS, the XO-Timer in HPM generates a 16-MHz clock with a power of 24.3 μW, and a phase noise of −133.8 dBc/Hz at 1-kHz offset. In the LPM, a 32.258-kHz clock is delivered while consuming 11.4 μW. The sleep-timer FoM2 is 14.8 μW and the Allan deviation is 35.1 ppb, achieving the lowest supply voltage (0.25 V) not only for a dual-mode XO-Timer but also for a MHz-range XO.
10.3
(7077)
14:50
     |
15:15
A 0.63-mm2/Ch 1.3-mΩ/√Hz-Sensitivity 1-MHz Bandwidth Active Electrode Electrical Impedance Tomography System
Ting Zhou, Hui Li, Jiajie Huang, Chao Wang, Qianyu Guo, Junyan Liu, Zhiwen Gu, Yang Zhao, Jian Zhao, Mingyi Chen, Yan Liu, Guoxing Wang, Yong Lian, Yongfu Li*
Shanghai Jiao Tong University, China

Abstract:
AE-EIT 2D system is presented using 1) direct IF down-conversion, and digitally switched SRDP I/Q demodulation technique with low power circuit techniques to improve the impedance resolution to 1.3mΩ/√Hz at 100kHz and reduce the variation of readout circuit 0.44mVpp (4.44×) while achieving the smallest area per channel of 0.63mm2 (1.38×-6.6×).
10.4
(7189)
15:15
     |
15:40
A 1.7-6.4 GHz fourth-order RF filter with 1-40% fractional bandwidth in 22-nm FDSOI
Iman Ghotbi, Baktash Behmanesh, and Markus Törmänen
Lund University, Sweden

Abstract:
This paper presents a fourth-order Q-enhanced RF filter featuring gm-boosting, noise-canceling, capacitive cross-coupling, and forward body-biasing techniques to realize 1.7 to 6.4 GHz operating range and up to 40% adjustable fractional bandwidth. The filter operates based on subtracting out-of-phase signals in the passband and in-phase signals in the stopband. Two Q-enhanced LC resonators are utilized for outphasing. Fabricated in 22 nm FDSOI, the chip achieves 4.6 dB NF, -14 dBm IB-IIP3, and 26 dBm IB-IIP2 at 4 GHz while drawing 22-45 mA from a 1 V supply. Fourth-order steep roll-off results in 17 dBm OOB-IIP3 at 2×BW frequency offset.

Session 11: Computing & Processing in Memory

Session Chair: Shyh-Shyuan Sheu, Industrial Technology Research Institute
Session Co-chair: Juang-Ying Chueh, Etron Technology Inc
Date: Nov. 08, 2022 (Tuesday)
Time: 14:00 – 15:40 (UTC+8)
Room: Song Bo 松柏廳, 10F

ID Time Title / Authors / Affiliation
11.1
(7061)
(Highlight)
14:00
     |
14:25
A 28nm Hybrid 2T1R RRAM Computing-in-Memory Macro for Energy-efficient AI Edge Inference
Wang Ye1,3, Chunmeng Dou1,3, Linfang Wang1,3, Zhidao Zhou1,3, Junjie An1, Weizeng Li1,3, Hanghang Gao1,3, Xiaoxin Xu1,3, Jinshan Yue1, Jianguo Yang1,3, Jing Liu1,3, Dashan Shang1,3, Jinghui Tian2, Qi Liu1,2, Ming Liu1,2
1Institute of Microelectronics of the Chinese Academy of Sciences, China
2Fudan University, China
3University of Chinese Academy of Sciences, China

Abstract:
This work presents the first 28nm hybrid 2T1R (H2T1R) RRAM computing-in-memory macro for AI edge inference. It features (1) the H2T1R cell array that can achieve >13X enhanced resistance-ratio, >80% reduced summation current, >67% smaller word-line voltage, and precise multi-bit weight encoding, and (2) reference-subtracting current sense amplifier (RS-CSA) that can reduce the number of the stand-by reference signals and extend the linear dynamic range of the current mirror. It performs highly accurate multi-bit analogue computation over 32 input channels with a peak energy efficiency up to 154.04 TOPS/W.
11.2
(7167)
(Highlight)
14:25
     |
14:50
A Local Transpose 9T SRAM Compute-In-Memory Macro with Programmable Single-Slope SAR ADC
Xin Zhang*1, Yongjun Jo*1, Jiahao Liu2, Jun Zhou2, Yuanjin Zheng1, and Tony Tae-Hyoung Kim1 (*Equally contributed authors)
1Nanyang Technological University, Singapore
2University of Electronic Science and Technology of China, China

Abstract:
This work proposes a two-directional transpose SRAM compute-in-memory (CIM) macro for inference and training in convolutional neural networks (CNN). A novel 9T SRAM bit-cell is proposed for local two-way computing without additional shared transpose processing units. The proposed transposable CIM achieves higher processing throughput from every bit-cell being able to operate at the same time in one CIM computing cycle. This work also proposes a programmable single-slope (SS) successive approximation (SAR) ADC for energy efficiency improvement by utilizing the probability density function of MAC values. The proposed ADC also supports the ReLu-based zero skip function by the SS operation. The test chip was fabricated by 180nm CMOS technology and achieved an energy efficiency of 6.61TOPS/W with the ADC zero-skip and SS operations.
11.3
(7203)
14:50
     |
15:15
Spike-CIM: A 290TOPS/W Spike-Encoding SparsityAdaptive Computing-in-Memory Macro with Differential Charge-Domain Integrate-and-Fire
Jiahao Song1, Xiyuan Tang1, Haoyang Luo1, Kuan Xu2, Yuan Wang1, Zhigang Ji2, Runsheng Wang1, and Ru Huang1
1Peking University, China
2Shanghai Jiao Tong University, China

Abstract:
This paper proposes a spike-encoding sparsity-adaptive computing-in-memory (CIM) macro (Spike-CIM) that offers excellent energy efficiency and robustness. A differential integrate-and-fire architecture, implemented by charge-domain cells, is proposed to achieve sparsity-adaptive power saving. The fabricated 65nm 32Kb Spike-CIM realizes a normalized energy efficiency of 1218 TOPS/W/Bit.
11.4
(7245)
15:15
     |
15:27
A Hybrid Temperature Compensation method combined with Digital and Analog Temperature Compensation Techniques for 3D-NAND Flash Memories
Dojeon Lee, Junhong Park, Philkyu Kang, Sungmin Jo, Seheon Baek, Chi-Weon Yoon, Dongku Kang
Samsung Electronics, Korea

Abstract:
The voltage compensation methods according to the temperature change can be typically divided into a digital method and an analog method. This paper proposes the hybrid temperature compensation method that combines the advantages of the Digital method and the Analog method to secure temperature linearity and reduce time overhead for temperature sensing.
11.5
(7160)
15:27
     |
15:40
A Variation-Tolerant Processing-In-Memory Architecture Using Discharging Current Calibration
Daiki Kitagata, Shinji Tanaka, Naoya Fujita and Naoaki Irie
Renesas Electronics Corporation, Japan

Abstract:
This paper presents a variation-tolerant ternary neural arithmetic memory (VT-TNAM) for energy-efficient processing-in-memory (PIM) accelerators. The VT-TNAM macro installs the newly proposed discharging current calibration (DCC) architecture using adjustable-current ternary bit cells (ACTBCs) to effectively mitigate local process variation. Furthermore, hierarchical MAC-operation skipping (HMS) architecture using the proposed small current detector (SCD) is also developed to compensate for energy efficiency degradation caused by MAC accuracy improvement. Successful reduction of process variation is verified using a fabricated test-element-group (TEG) in 22nm process and 20.0 – 59.2 TOPS/W is achieved by introducing the HMS architecture.

Session 12: Advanced Wireline Transceiver Techniques

Session Chair: Wei-Zen Chen, National Yang Ming Chiao Tung University
Session Co-chair: Jung-Hoon Chun, Sungkyungkwan University
Date: Nov. 08, 2022 (Tuesday)
Time: 14:00 – 15:40 (UTC+8)
Room: Chang Chin 長青廳, 10F

ID Time Title / Authors / Affiliation
12.1
(7051)
(Highlight)
14:00
     |
14:25
A 103 fJ/b/dB, 10-26 Gbps Receiver with a Dual Feedback Nested Loop CDR for Wide Bandwidth Jitter Tolerance Enhancement
Yao-Chia Liu1, Wei-Zen Chen1, Yuan-Sheng Lee2, Yu-Hsiang Chen2, Shawn Min2, Ying-Hsi Lin2
1National Yang Ming Chiao Tung University, Taiwan
2Realtek Semiconductor Corp., Taiwan

Abstract:
A Nested CDR based Receiver with PI controller is presented. The direct modulation jumps over the loop latency limited PI path and modulate VCO for faster response and enhance the stability. The measured jitter tolerance curve shows 0.15UI enhancement at 60MHz, while DFE is simplified by edge based algorithm the receiver is able to tolerate 32dB channel loss. For CDR only prior art , this work improves twice more than traditional PI architecture and four times more than DCO architecture in term of power efficiency.
12.2
(7026)
(Highlight)
14:25
     |
14:50
A 42Gb/s PAM-8 Transmitter with Feed-Forward Tomlinson-Harashima Precoding in 28nm CMOS
Byungjun Kang, Woosong Jung, Hyojun Kim, Sanghee Lee, and Deog-Kyoon Jeong
Seoul National University, Korea

Abstract:
A 42Gb/s PAM-8 transmitter (TX) with feed-forward Tomlinson-Harashima precoding (FF-THP) is presented. The FF-THP architecture produces a uniform output distribution with higher average signal power compared with the FFE. The fabricated chip compensates for the 7.7dB channel loss with the PAM-8 signaling. As a result, it achieves the power efficiency of 1.58pJ/b, occupying 0.0703mm2.
12.3
(7228)
14:50
     |
15:15
A 11.4-Gbps/lane MIPI 32-bit C-PHY and D-PHY combo transmitter with 3-tap FFE
Junhan Bae1, Myeongkyu Song1, Bongkyu Kim1, Junkyu Lee1, Woosung Park1,2, and Jung-Hoon Chun1,3
1Sungkyunkwan University, Korea
2Samsung Electronics, Korea
3SolidVue, Korea

Abstract:
This paper describes a MIPI C/D-PHY combo transmitter (TX) fabricated in 110nm CMOS image sensor (CIS) process. The same hardware can be shared to support both C-PHY and D-PHY with little extra circuitry. The adopted 32-bit architecture that enables double data rate (DDR) in C/D-PHY can maximize the data rate, allowing it to exceed the limits of legacy sub-micron process technologies. In addition, the proposed TX utilizes 3-tap feed-forward equalization (FFE) in both the C-PHY and D-PHY modes, effectively eliminating the inter-symbol interference (ISI) induced by band-limited channels. The measured results indicate that the compliance test verified in C-PHY mode is comfortably passed at data rates up to 11.4 Gbps (5 Gsps) per lane. The eye diagrams in D-PHY mode are fully open at the data rates up to 6 Gbps per lane.
12.4
(7226)
15:15
     |
15:40
A 5.0-to-12.5-Gb/s, 1.7-pJ/b, 0.66-µs Lock-time Referenceless Sub-sampling CDR with Beat Detection FLL in 28nm CMOS
Woosung Park1, 2, Jahoon Jin2, Minsu Park1, Sangdon Jung1, 2, and Jung-Hoon Chun1, 3
1Sungkyunkwan University, South Korea
2Samsung Electronics, South Korea
3SolidVue, South Korea

Abstract:
The capture range of the SSPD is wider than that of the PD, relieving the burden of reducing the residual frequency. In practice, the SSPD-based CDR (SSCDR) in [1] corrects frequency errors without an FLL, saving significant power. The SSCDR also achieves short lock-time with a wide bandwidth; therefore, it is suitable for the burst-mode operation which requires a sub-ns relocking time. To take advantage of these desirable characteristics of the SSCDR, this work benchmarks [1] and extends the frequency coverage by employing the beat detection FLL. The proposed FLL shows faster-locking behavior than prior arts through a beat correction process using the down-conversion function of the SSPD. As a result, the proposed FLL relieves a trade-off between lock time and frequency coverage. We also propose a bandwidth-control technique and an energy-efficient dual-mode SSPD.

Session 13: Communication and Powering Techniques for Biomedical Applications

Session Chair: Youngcheol Chae, Yonsei University
Session Co-chair: Inhee Lee, University of Pittsburgh
Date: Nov. 08, 2022 (Tuesday)
Time: 14:00 – 15:40 (UTC+8)
Room: V110 十全軒, VF

ID Time Title / Authors / Affiliation
13.1
(7025)
(Highlight)
14:00
     |
14:25
A 20-MHz 2.3-mW Receiver and a 25-V Transmitter for Ultrasound Capsule Endoscopy
Kyeongwon Jeong1, Jaesuk Choi1, Gichan Yun1, Injun Choi1, Jeehoon Son2, Jae Youn Hwang2, Sohmyung Ha3, and Minkyu Je1
1KAIST, Korea
2DGIST, Korea
3New York University Abu Dhabi, United Arab Emirates

Abstract:
We proposed firstly ultrasound capsule endoscopy (USCE) ASIC. An on-chip transmitter (TX) is designed to generate a high voltage pulse applied in the transducer. In addition, a highly power-efficient ultrasound (US) receiver (RX) IC for US capsule endoscopy (USCE) is presented. We propose a RX structure with synchronized analog envelope detection (ED) to reduce the required ADC speed. A ping-pong noise-shaping SAR (NS-SAR) ADC with a passive gain is employed for high power efficiency and resolution.
13.2
(7159)
14:25
     |
14:50
An Intra-Body-Power-Transfer System with a PLL-based Continuous Maximum Resonant Power Tracking Loop at TX and 1.8V DC Output Voltage at RX
Hyungjoo Cho1, Ji-Hoon Suh1, Gichan Yun1, Sohmyung Ha2, and Minkyu Je1
1KAIST, Korea
2New York University Abu Dhabi, United Arab Emirates

Abstract:
We present an intra-body-power-transfer (IBPT) system that delivers power greater than 100μW even across 150cm on-body distance. The proposed IBPT TX employs a PLL-based maximum-resonant-power-tracking (MRPT) loop running in the background to maximize the power delivered to the load (PDL) without any need for RX-to-TX back telemetry or tuning phase, enabling continuous power delivery. The PDL and power transfer efficiency (PTE) are further improved by inducing parallel resonance at RX. Fabricated in a 180nm BCD process, the IBPT system achieves 136μW PDL at 1.8V DC output with 8.83% end-to-end power efficiency.
13.3
(7163)
14:50
     |
15:15
A 2m-Range 711uW Body Channel Communication Transceiver Featuring Dynamically-Sampling Bias-Free Interface Front End
Guanjie Gu1, Changgui Yang1, Zhuhao Li1, Xiangdong Feng1, Ziyi Chang1, Ting-Hsun Wang1, Yunshan Zhang1, Yuxuan Luo1, Hong Zhang1, Ping Wang1, Sijun Du2, Yong Chen3, and Bo Zhao1*
1Zhejiang University, China
2Delft University of Technology, Netherlands
3University of Macau, China
* Corresponding Author: Bo Zhao ([email protected])

Abstract:
The state-of-art BCC transceivers have realized low power consumption, but the communication range is still limited to less than 1m. One of the issues limiting the communication range of BCC is the loss at the interface between human body and transceiver. The DC bias in previous closed-loop and gate-input techniques reduced the input impedance and voltage gain of IFE, leading to a high interface loss. In this work, we propose a dynamically-sampling bias-free IFE to realize a 90KOhm input impedance and 94dB RF-IF conversion gain of IFE, resulting in a receiving sensitivity of -104dBm. Therefore, the communication range has been extended to 2m with 711uW total power consumption.
13.4
(7169)
15:15
     |
15:40
A Low-power Sleep Apnea Monitoring IC with a Duty-Recovered Body Channel Communication Receiver
Pangi Park, Donghyeok Cho, SeongHwan Cho
KAIST, Korea

Abstract:
This paper presents an in-home level-4 sleep apnea monitoring IC that can measure three basic parameters such as airflow, HR, and SpO2. A duty-recovered BCC receiver is proposed to allow the both transmitter and receiver side can be duty-cycled, and the power efficiency of the readouts is improved by regulating the voltage of the interface node of sensing units and readouts. With the proposed techniques, the receiver power is reduced by 98.8%, and the overall system power is 93.8% smaller than the previous work.

Session 14: LDO Voltage Regulators

Session Chair: Hyun-Sik Kim, KAIST
Session Co-chair: Hyungil Chae, Konkuk University
Date: Nov. 09, 2022 (Wednesday)
Time: 09:00 – 10:40 (UTC+8)
Room: Auditorium 國際會議廳, 10F

ID Time Title / Authors / Affiliation
14.1
(7085)
09:00
     |
09:25
A Digital LDO in 22nm CMOS with a 4b Self-triggered Binary Search Windowed Flash ADC Featuring Automatic Analog Layout Generator Framework
Xiaosen Liu1,2, Soner Yaldiz2, Parijat Mukherjee2, Steven Burns2, Harish Krishnamurthy2, Krishnan Ravichandran2, Zakir Ahmed2, Nachiket Desai2, Nicolas Butzen2, James Tschanz2, Vivek De2
1Tsinghua University, School of Integrated Circuits, China
2Intel Corporation, U.S.A

Abstract:
An analog layout generator based DLDO with a self-triggered binary search windowed flash ADC is proposed in 22nm CMOS to maximize the productivity of implementing analog circuit blocks in scaled CMOS process, thus significantly improving the physical design time & effort up to 60× compared with conventional manual approach. A self-triggered binary search mechanism with a delay-based architecture is proposed to reduce the exponentially growing kickback noise and energy consumption of a traditional flash ADC down to the level of a SAR ADC while maintaining its high speed feature. The DLDO features 3.55ps FoM and fully automatic generation.
14.2
(7086)
09:25
     |
09:50
A Fast-Transient and Wide-Range Output Capacitor-Less NMOS LDO Regulator with Adaptive-Gain Nested Miller Compensation and Pre-Emphasis Inverse Biasing
Hyunjun Park, Woojoong Jung, Minsu Kim, and Hyung-Min Lee
Korea University, Korea

Abstract:
The proposed capless LDO can ensure stability at a wide load range as well as achieve higher bandwidth for fast transient at larger ILOAD by adopting an adaptive-gain nested Miller compensation. A pre-emphasis inverse biasing also improves slew rate at the gate of an NMOS pass transistor by sourcing adaptive bias current into a super source follower. The 180nm CMOS LDO acquires high unity-gain bandwidth of 17.5MHz while providing a wide ILOAD range from 0.1mA to 300mA with phase margin above 60°. The LDO ensures small undershoot (48mV) and overshoot (59mV), achieving best FoM of 1.72ps.
14.3
(7144)
09:50
     |
10:15
A Capacitor-less Digital LDO using Ripple-FrequencyAdaptive Time-domain Digital Pre-distortion Technique
Angxiao Yan1, Wei Deng1,2, Haikun Jia1, Shiwei Zhang1, Rui Wu3, Zhihua Wang1,2, and Baoyong Chi1
1singhua University, China
2Research Institute of Tsinghua University in Shenzhen, China
3National Key Lab of Microwave Imaging Technology, AIR, CAS, China

Abstract:
A Digital low-dropout regulator (D-LDO) with time-domain digital pre-distortion (DPD) scheme is introduced in this paper. It features adaptive suppression of supply voltage ripple without introducing analog-assisting loop or large capacitor. The proposed all-digital ripple cancellation technique is effective against arbitrary ripple waveforms and any ripple frequency from kHz to a quarter of the clock frequency. The measurement results indicate a -24.5 dB rejection ratio and an improvement of 9.5 dB over the conventional D-LDO. This work demonstrates the possibility and feasibility of digital-domain ripple cancellation for the first time.
14.4
(7072)
10:15
     |
10:40
A Self-Clocked TDC-Based Unified Clock and Voltage Regulator with Replica Frequency-Locked Loop and Hysteresis Switching in 65nm CMOS
Xuliang Wang, Wing-Hung Ki, and Philip K. T. Mok
The Hong Kong University of Science and Technology, China

Abstract:
A self-clocked digital low-dropout regulator (DLDO) employing a tunable replica oscillator (TRO) and a beat-frequency (BF) quantizer is proposed to supply and clock the microprocessors. The standard D-flip-flop is utilized as both the time-to-digital converter (TDC) and the sampling clock or BF clock generator. Fast transient response and static low power consumption are achieved simultaneously by the adaptive sampling capability of the BF quantizer. With the help of the proposed hysteresis switching logic (HSL) and replica frequency-locked loop (FLL), the built-in offset of the BF quantizer is eliminated. The TRO powered by the output of DLDO mimics half of the critical path delay of microprocessors and guarantees error-free operation even during voltage undershoot caused by load transients. In the load transient test of 50mA/μs with a 100-pF load capacitor, the proposed HSL improves the voltage undershoot and the steady-state offset by 25% and 84%, respectively. Fabricated in 65-nm LP process, the tested prototype holds an active area of 0.045mm^2 and achieves 0.76-ps FOM.

Session 15: Energy-Efficient Machine Learning Processors and High-Speed Interface

Session Chair: Yu-Guang Chen, National Central University
Session Co-chair: Chao Wang, Huazhong University of Science and Technology
Date: Nov. 09, 2022 (Wednesday)
Time: 09:00 – 10:40 (UTC+8)
Room: Song Bo 松柏廳, 10F

ID Time Title / Authors / Affiliation
15.1
(7145)
(Highlight)
09:00
     |
09:25
A 2.47 μJ/sample QR-Decomposition-based Extreme Learning Machine Engine Supporting Online Class Incremental Learning for ECG-based User Identification
Yi-Ta Chen, Li-Sheng Chang, Yu-Chuan Chuang, An-Yeu Wu
National Taiwan University, Taiwan

Abstract:
To support online class incremental learning (O-CIL) in ECG-based user identification, this work presents a QR-decomposition-based extreme learning machine (QRD-ELM) engine. A diagonally-mapped linear array (DMLA) enables the support of online learning reducing 98.5% of area. The integrated PE design with unified COordinate Rotation DIgital Computer (u-CORDIC) further reduces 15.3% of the area and 22.4% of the power consumption. A model-algorithm-circuit co-design module to support class incremental learning with low energy and area overhead. The QRD-ELM engine fabricated in 40nm CMOS technology with 1.33×1.33 mm2 die area achieves 2.47 μJ/sample learning energy efficiency, which is 28.5× than the state-of-the-art.
15.2
(7215)
(Highlight)
09:25
     |
09:50
A 1.3mW Speech-to-Text Accelerator with Bidirectional Light Gated Recurrent Units for Edge AI
Yu-Hsuan Tsai*1, Yi-Cheng Lin*1, Wen-Ching Chen2, Liang-Yi Lin2, Nian-Shyang Chang2, Chun-Pin Lin2, Shi-Hao Chen3, Chi-Shi Chen2, and Chia-Hsiang Yang1
1National Taiwan University, Taiwan
2Taiwan Semiconductor Research Institute, Taiwan
3Digwise Technology Ltd., Taiwan
*Equally-Credited Authors (ECAs)

Abstract:
This work presents an energy-efficient speech-to-text accelerator. The bidirectional light gated recurrent unit (BLiGRU)-based neural network is adopted to achieve a high accuracy. Network compression is utilized to reduce the network size and associated computational complexity by 29.8× and 73.2×, respectively. Efficient sequence decoding without backtracking is implemented to reduce the latency and memory usage. The chip performs speech-to-text conversion in 9.77 ms/frame with 1.3 mW at 1.25 MHz. Compared to the state-of-the-art designs, the chip achieves a 6.5-to-177× lower normalized energy with the lowest 15.2% phone error rate (PER) on the TIMIT dataset.
15.3
(7166)
09:50
     |
10:15
A 6 Gbps PAM-3 Transceiver with Time-Varying Offset Compensation
Ju Eon Kim1,2, Dong-Hyun Yoon2, Junyoung Song3, Kwang-Hyun Baek4, Jung-Hwan Choi1, and Tony Tae-Hyoung Kim2
1Samsung Electronics, Korea
2Nanyang Technological University, Singapore
3Incheon National University, Korea
4Chung-Ang University, Korea

Abstract:
CMOS technology scaling improves performance by reducing supply voltage, parasitic capacitor, and physical area. Thus, device reliability issues, such as component mismatches and aging effects become prominent in the aggressively scaled technology. Especially, signal levels of PAM are highly susceptible to PVT variations and device mismatches. This paper proposes an offset compensation technique for a PAM-3 transceiver. The proposed compensation algorithm continuously detects faulty patterns and generates optimal reference voltage for the single-to-differential amplifier to cancel out time-varying offset. This work presents a 6Gbps PAM-3 transceiver in 65nm CMOS. The proposed technique improves the eye-opening by 38%.
15.4
(7147)
10:15
     |
10:40
A 12.8-Gbps 0.5-pJ/b Encoding-less Inductive Coupling Interface Using Clocked Hysteresis Comparator for 3Dstacked SRAM in 7-nm FinFET
Kota Shiba1, Mitsuji Okada2, Atsutake Kosuge2, Mototsugu Hamada2, and Tadahiro Kuroda2
1The University of Tokyo, Japan
2Research Association for Advanced Systems, Japan

Abstract:
A 0.5-pJ/b 12.8-Gbps/link inductive coupling inter-chip wireless communication interface for a 3D-stacked SRAM has been developed in a 7-nm FinFET process. A new clocked hysteresis comparator that eliminates encoding for synchronous communication achieves 1.49 times higher data rate and 36% lower energy consumption compared to conventional synchronous communication using Manchester encoding. Inter-chip communication at 0.5-pJ/b 12.8-Gbps/link was confirmed using test chips. The proposed interface for a 4-hi 3D-stacked SRAM module achieves a 1.7-TB/s/mm2 IO area efficiency, representing a two-orders-of-magnitude improvement over a state-of-the-art interface for a 3D-stacked SRAM with competitive energy efficiency.

Session 16: Advanced Signal Generation and Radar Techniques

Session Chair: Kenichi Okada, Tokyo Institute of Technology
Session Co-chair: Howard Luong, Hong Kong University of Science and Technology
Date: Nov. 09, 2022 (Wednesday)
Time: 09:00 – 10:40 (UTC+8)
Room: Chang Chin 長青廳, 10F

ID Time Title / Authors / Affiliation
16.1
(7111)
09:00
     |
09:25
A Compact Square-Geometry Quad-Core 19 GHz Class-F VCO with Parallel Inductor-sharing Technique achieving -137.2 dBc/Hz Phase Noise at 10MHz Offset
Yaqian Sun1, Wei Deng1,2, Haikun Jia1, Zhihua Wang1,2, and Baoyong Chi1
1Tsinghua University, China
2Research Institute of Tsinghua University in Shenzhen, China

Abstract:
A square-geometry quad-core oscillator with inductor sharing technique is proposed in this paper and it exhibits a compact area of 0.09 mm2, which is the smallest quad-core VCO operating at a similar oscillation frequency. The unwanted mode is suppressed by the metal trace that connects the drain node of adjacent cores. The proposed VCO is fabricated in 65nm CMOS technology. The measured phase noise is -137.2 dB/Hz at 10 MHz offset frequency from a carrier of 19 GHz, which translates to the FoM of 186.1 dBc/Hz.
16.2
(7064)
09:25
     |
09:50
A 17-21GHz Current-Folding Frequency Tripler With >36dBc Harmonic Rejection in 90nm CMOS
Chun-Hung Lin and Ching-Yuan Yang
National Chung Hsing University, Taiwan

Abstract:
A frequency tripler (FT) using a current-folding technique to achieve inherently nonlinear operation is presented. A built-in VCO generates the fundamental signal, and the proposed current-folding stage converts the fundamental input into the triple-frequency output, which is injected into a bandpass stage for harmonic suppression. Fabricated in 90-nm CMOS technology, the measured FT features 36 to 43-dBc harmonic rejection from 17.5 to 21 GHz (18.2% FTR), while consuming 3.5 mW only from 1.2-V supply. The measured phase noise (PN) of the VCO and the FT are -112.5 and -102.8 dBc/Hz at 1-MHz offset, respectively. Furthermore, the achieved figure-of-merit (FoM) of the proposed FT are -180.52 and -190.87 dB at 1-MHz and 10-MHz offset, respectively.
16.3
(7191)
09:50
     |
10:15
An 18.8-to-20.3-GHz Wide-Ramping-Range Cascaded-PLL-Based FMCW Generator with 44.1-kHz RMS Frequency Error and -105.6-dBc/Hz Phase Noise in 40-nm CMOS
Xiaofei Liao1,2, Feifan Hong1,2, Sijie Pan2, Xiaohu You1,2, and Dixian Zhao1,2
1Southeast University, China
2Purple Mountain Laboratories, China

Abstract:
A cascaded phase-locked loop (PLL) with wideband low-noise frequency modulation for frequency-modulated continuous-wave (FMCW) radar applications is presented. It utilizes a wideband millimeter-wave VCO with flat gain sensitivity to ensure wide chirp bandwidth and frequency modulation linearity. An in-depth analysis of the loop bandwidth optimization in cascaded PLL for the FMCW synthesizer is detailed. Fabricated in 40-nm CMOS, the proposed cascaded PLL can produce 1.5-GHz triangular and sawtooth chirp from 18.8 to 20.3 GHz, achieving a minimum root-mean-square (rms) frequency error of 44.1 kHz. The measured PN at 1-MHz offset from 19.2 GHz is -105.6 dBc/Hz.
16.4
(7089)
10:15
     |
10:40
A 140GHz 4TX-4RX Phased-Array FMCW-FSK AntennaPackaged Radar Chipset With 25dBm EIRP and 16GHz BW
Shunli Ma1, Tianxiang Wu1, Zhuofan Xu1, Zhonghao Sun1, Xuefeng Li1, Lei Wu1, Biao Hu1, Junyan Ren1, Yong Chen2, and Jiebin Pan3
1Fudan University, China
2University of Macau, China
3East China Institute of Photo-Electron IC, China

Abstract:
Frequency modulated continuous wave (FMCW) radar sensors are widely utilized for security checks, car-collision avoidances, vital signs of people, and tiny movements [1]-[5]. The 4D mm-wave radar needs large phased-array elements to realize accurate detecting. Range resolution is determined by the bandwidth (BW) of the transceiver (TRX). Moreover, it is better to design sensing and communication functions into the system simultaneously. This paper presents a 140GHz phased-array FMCW chipsets in 65nm bulk CMOS supporting a 16GHz BW with a custom horn antenna package. Based on the tile structures of the TRX, our system can be scaled up to a large size array for 4D phased-array radar.

Session 17: Emerging Circuit Techniques for Power Management, Sensing and Computing

Session Chair: Takuji Miki, Kobe University
Session Co-chair: Chihiro Okada, Sony Semiconductor Solutions Corporation
Date: Nov. 09, 2022 (Wednesday)
Time: 09:00 – 10:40 (UTC+8)
Room: V110 十全軒, VF

ID Time Title / Authors / Affiliation
17.1
(7236)
(Highlight)
09:00
     |
09:25
A 14V Hybrid Boost Converter With Scalable Conversion Ratio in 180nm Standard CMOS for an Ultrasound Imaging System
Jiaqi Guo1, Jiamin Li2, Jerald Yoo1,3
1National University of Singapore, Singapore
2Southern University of Science and Technology, China
3The N.1 Institute for Health, Singapore

Abstract:
To provide the high voltage supply (>10V) and intermediate voltage domains required by the transducer driving circuits for ultrasound imaging, and to achieve that in the standard CMOS process for easy processor and IP integration, this works presents a 14V multiple-output boost converter with hybrid structure and PWM mode operation. The chip implemented in 180nm standard CMOS process regulates 3.5V, 7V, 10.5V and 14V from a 1.5V input, while keeping the switch stress (VGS, VDS) of all transistors below 3.5V at any switching state. It achieves a simulated efficiency of 78%, doubling the 35% achieved in earlier works.
17.2
(7091)
09:25
     |
09:50
A 0.24 mmHg (1σ) Resolution Half-Bridge-to-Digital Converter with RC Delay-Based Pressure Sensing and Energy-Efficient Bit-Level Oversampling Techniques for Implantable Miniature Systems
Donguk Seo1, Minsik Cho1, Minhyeok Jeong1, Gicheol Shin1, Inhee Lee2, and Yoonmyung Lee1
1Sungkyunkwan University, Korea
2University of Pittsburgh, USA

Abstract:
A pressure sensor with a half-Wheatstone-bridge-to-digital converter is proposed for implantable miniature systems. The half-Wheatstone-bridge sensor uses an RC delay comparison, which self-limits current for energy-efficient operation. To overcome the limited sensitivity of the HB, bit-level oversampling is introduced and 0.24 mmHg (1σ) resolution with an 8.58 nJ∙mmHg2 FOM is achieved, which is significantly better than that of the prior-art HB-based pressure sensor and comparable to the Wheatstone-bridge-based pressure sensors.
17.3
(7040)
09:50
     |
10:15
A 0.0308mm2 4.15pJ/conv VCO-Based Current Sensing Front-End with 2nd-Order Δ2-ΔΣ Modulation
Jee-Ho Park, Ji-Hyoung Cha, Yongjae Park, and Seong-Jin Kim
Ulsan National Institute of Science and Technology, Korea

Abstract:
This paper presents a 2nd-order Δ2-ΔΣ modulator based on a VCOQ with a PWM I-DAC for the precise acquisition of incoming current in an area- and energy-efficient form factor. The proposed Δ2-modulation substantially attenuates the magnitude of input signals, enhancing the linearity and DR. Moreover, an additional differentiator followed by the VCOQ features the negative feedback loop in the 2nd-order ΔΣ modulator, increasing noise shaping order with no DAC noise. In addition, the PWM I-DAC substituting the multi-bit I-DAC is devised to mitigate noise further, realizing the high resolution of 1 pA with 500-Hz bandwidth. The prototype chip fabricated in a 110-nm CMOS occupies 0.0308mm2 and achieves the Walden FoM of 4.15 pJ/conv.
17.4
(7193)
10:15
     |
10:40
A 57.2GHz 11.2mW 8-bit General Purpose Superconductor Microprocessor with Dual-Clocking Scheme
Ikki Nagaoka1, Ryota Kashima1, Tomoki Nakano1, Masamitsu Tanaka1, Taro Yamashita2, Koji Inoue3, and Akira Fujimaki1
1Nagoya University, Japan
2Tohoku University, Japan
3Kyushu University, Japan

Abstract:
A superconductor single-flux-quantum (SFQ) logic 8-bit microprocessor is demonstrated up to 57.2 GHz with a measured power consumption of 11.2 mW. The microprocessor has an ultradeep, gate-level pipelining containing many feedback paths and communications between components. The arrival clock timings at all the logic gates are ultra-precisely tuned using two different clocking schemes, called “concurrent-flow” and “counter-flow,” to achieve extremely high clock frequency operation over 50 GHz. Low-temperature circumstances enable us to conduct super delay-intensive layout design by controlling delays of all waveguide interconnects in the order of sub-picosecond precision.

Session 18: Sensor Interfaces and References

Session Chair: Taekwang Jang, ETH, Swiss
Session Co-chair: Pieter Harpe, Eindhoven University of Technology
Date: Nov. 09, 2022 (Wednesday)
Time: 14:00 – 15:40 (UTC+8)
Room: Auditorium 國際會議廳, 10F

ID Time Title / Authors / Affiliation
18.1
(7080)
(Highlight)
14:00
     |
14:25
A 0.56V/0.8V Vision Sensor with Temporal Contrast Pixel and Column-Parallel Local Binary Pattern Extraction for Dynamic Depth Sensing Using Stereo Vision
Min-Yang Chiu, Guan-Cheng Chen, Yu-Hsiang Huang, Tzu-Hsiang Hsu, Chung-Chuan Lo, Ren-Shuo Liu, Meng-Fan Chang, Kea-Tiong Tang, Chih-Cheng Hsieh
National Tsing Hua University, Taiwan

Abstract:
A 0.56V/0.8V 126x126 vision sensor with 6T1C temporal contrast pixel, exposure compensation scheme, column-parallel local-binary-pattern (LBP) and region-of-interest (ROI) extractions is prototyped and verified. For motion detection and position tracking, it supports 10b raw image, 10-bit frame difference, and 1.5-bit event reporting (ER) output. For dynamic depth sensing of moving objects using stereo vision system, it supports 8-bit LBP feature map and ROI for efficient disparity calculation.
18.2
(7150)
14:25
     |
14:50
A 118.6fJ/Conversion-Step Two-Step Time-Domain RCto-Digital Converter With 33nF/10MΩ Range and 53aFrms Resolution
Hoyong Seong1, Chongsoo Jung1, Donghyun Youn1,Junghyup Lee2, Sohmyung Ha3, and Minkyu Je1
1KAIST, Korea
2DGIST, Korea
3New York University Abu Dhabi, United Arab Emirates

Abstract:
This paper presents a 2-step time-domain (TD) RC-to-digital converter (RCDC). To overcome the fundamental tradeoff between resolution and energy efficiency that constrains TD converter designs, a 2-step TD conversion method is proposed. Utilizing a slow reference oscillator (R-OSC) for coarse conversion and a fast duty-cycled gear-up oscillator (G-OSC) for fine conversion, the time period of the sensor oscillator output after frequency division can be measured with both high resolution and high energy efficiency. A duty-cycled phase-locked loop (PLL) is employed to consistently maintain the required relationship between the R-OSC and G-OSC outputs without any calibration. Fabricated in a 180nm CMOS, the proposed 2-step TD RCDC IC achieves 53aFrms resolution and 33nF/10MΩ input range, consuming 6.75μW.
18.3
(7224)
(Highlight)
14:50
     |
15:15
A −50 to 130 °C, 38.69 pJ/conv Fully Integrated SAR Temperature Sensor Based on Direct Temperature-Voltage Comparison
Jooeun Kim, Jeongmyeong Kim, Changjoo Park, Minkyu Yang, and Wanyeong Jung
KAIST, South Korea

Abstract:
This paper presents a SAR temperature sensor using a clocked temperature-voltage comparator. The clocked comparator has an input offset which is linearly proportional to the temperature, and the SAR detects the offset voltage to measure the temperature. Temperature transduction is spatially and temporally confined in the comparator’s dynamic comparison, so it is robust against various circumstances. The SAR-based overall structure allows simple design and operation, without complex digital filtering nor post-processing, and low energy consumption. The test chip fabricated in 0.18μm CMOS process shows 3-sigma error of −2.54/+2.16°C over a wide range of −50 to +130°C, with 38.69pJ/conv energy consumption.
18.4
(7020)
15:15
     |
15:27
A Digital Temperature Sensor Based on 10b SAR ADC for Non-linear Temperature Dependency Compensation in 3D NAND Flash Memory
Kyoung-Jun Roh, Min-Ki Jeon, Jaewoo Park, Myoungbo Kwak, Chi-Weon Yoon, Youngdon Choi and Jung-Hwan Choi
Device Solutions, Samsung Electronics, Korea

Abstract:
In this paper, we propose a digital temperature sensor (DTS) to compensate a nonlinearity of VT shift with temperature in VNAND flash memory. The DTS consists of a voltage generator that generates a CTAT voltage from a bandgap reference voltage and a 10-bit SAR type ADC. And, the DTS is designed to work in synchronization with a NAND command signal. The proposed circuit is implemented with multi-stacked VNAND technology of Samsung Electronics. The conversion time takes a total of 4 μs including the voltage generator setup time. And, the resolution of 40 samples is 0.753 °C/LSB, and the maximum deviation with 1-point calibration for each NAND operation is 12 LSB.
18.5
(7102)
15:27
     |
15:40
A sub-nW scalable nMOS voltage reference with multiloop regulation achieving 0.0126%/V line sensitivity
Chutham Sawigun, Xiaolin Yang, Andrea Lodi, and Carolina Mora Lopez
imec, Belgium

Abstract:
In order to achieve a better LS than other existing techniques, we propose in this paper a regulated voltage reference that allows multiple regulation loops for LS improvement, and offers output voltage scalability in a single-branch topology. The proposed VR uses only nMOS devices, occupies the smallest area and achieves the lowest LS compared with other state-of-the-art regulated VRs.

Session 19: Imaging & Machine Learning Processing on FPGA

Session Chair: Tay-Jyi Lin, National Chung Cheng University
Session Co-chair: Ji-Hoon Kim, Ewha Womans University
Date: Nov. 09, 2022 (Wednesday)
Time: 14:00 – 15:40 (UTC+8)
Room: Song Bo 松柏廳, 10F

ID Time Title / Authors / Affiliation
19.1
(7217)
14:00
     |
14:25
A Real-Time High-Resolution Variable-Size Imaging Processor for Spaceborne Synthetic Aperture Radar
Jia-Zhao Lin1, Po-Ta Chen1, Hung-Yuan Chin1, Pei-Yun Tsai1, and Sz-Yuan Lee2
1National Central University, Taiwan
2National Applied Research Laboratory, Taiwan

Abstract:
We present a real-time imaging processor for spaceborne high-resolution synthetic aperture radar. To achieve the goal, DRAM burst access pattern is developed given azimuth FFT/IFFT decomposition with bit-reversed frequency-domain data to achieve streaming input /output in the processing kernel. Hybrid datapaths that use 17-bit customized floating point (CFP) FFT/IFFT operations and 64-bit double precision arithmetic units for phase calculation are designed to meet the precision requirement. Multi-segment high-order Taylor series expansion is adopted to approximate the complicated migration factors to support configurability. Our implementation shows at least 2.93X improvement in normalized processing time and has excellent precision.
19.2
(7239)
14:25
     |
14:50
A 409.6 GOPS and 204.8 GFLOPS Mixed-Precision Vector Processor System for General-Purpose Machine Learning Acceleration
Jung-Hoon Kim, Sukjin Lee, Seungjae Moon, Sungyeob Yoo, and Joo-Young Kim
KAIST, Korea

Abstract:
This paper presents a mixed-precision vector processor named MVP and its multi-core system for general-purpose ML acceleration. It has three key contributions: 1) MVP supports fixed and floating-point data types and various AI operations with scalable vector lanes, 2) MVP has a two-level instruction set architecture (ISA), and its microcode generator enables handy ML model mapping and small code size, and 3) the software stack efficiently allocates a target ML model into multiple MVPs, generating all the necessary runtime binaries. As a result, the proposed multi-MVP system provides a peak performance of 409.6 GOPS and 204.8 GFLOPS and energy efficiency of 13.97 GOPS/W and 6.99 GFLOPS/W on a Xilinx Alveo U50 FPGA card, achieving 83.84% average effective utilization when it runs various ML models.
19.3
(7248)
14:50
     |
15:15
An Efficient Unsupervised Learning-based Monocular Depth Estimation Processor with Partial-Switchable Systolic Array Architecture in Edge Devices
Wonhoon Park, Dongseok Im, Hankyul Kwon, and Hoi-Jun Yoo
Korea Advanced Institute of Science and Technology, Korea

Abstract:
In this paper, the unsupervised learning-based MDE processor is proposed with the following key features: 1) the multi-path simultaneous processing (MPSP) to reduce the external memory access of the multi-path sampling block by 16.8%, 2) partial-switchable systolic array (PSSA) architecture to maintain the high utilization of the processing elements achieving average 51.5% of throughput enhancement, and 3) dynamic network selection learning (DNSL) system to optimize the pose network during the training increasing the system energy efficiency by 59% for getting supervision
19.4
(7235)
15:15
     |
15:40
F-LIC: FPGA-based Learned Image Compression with a Fine-grained Pipeline
Heming Sun1,2,3, Qingyang Yi4, Fangzheng Lin1, Lu Yu2, Jiro Katto1, and Masahiro Fujita4,5
1Waseda University, Japan
2Zhejiang University, China
3JST, PRESTO, Saitama, Japan
4The University of Tokyo, Japan
5AIST, Japan

Abstract:
This paper gives an FPGA design for learned image compression (LIC). By proposing a fine-grained pipelining schedule, higher DSP efficiency can be obtained. Besides, we also propose the cascading DSP schemes and zero-skipping deconvolution scheme. Compared with latest FPGA-based LIC, we can reach faster speed with higher power efficiency.

Session 20: Interfaces for High-Speed Memory

Session Chair: Chiweon Yoon, Samsung Electronics
Session Co-chair: Pen-Jui Peng, National Tsing Hua University
Date: Nov. 09, 2022 (Wednesday)
Time: 14:00 – 15:40 (UTC+8)
Room: Chang Chin 長青廳, 10F

ID Time Title / Authors / Affiliation
20.1
(7104)
(Highlight)
14:00
     |
14:25
A 0.95pJ/b 5.12Gb/s/pin Charge-Recycling IOs with 47% Energy Reduction for Big Data Applications
Han Wu1, Jeong Hoan Park2, Miaolin Zhang1, Longyang Lin3, Rucheng Jiang1, Jung-Hwan Choi2, Jerald Yoo1,4
1National University of Singapore, Singapore
2Samsung Electronics, South Korea
3Southern University of Science and Technology, China
4The N.1 Institute for Health, Singapore

Abstract:
We propose the Charge-Recycling IOs (CRIOs) save energy up to 32.2% for the TSV link (2.56Gb/s) and 47% for the T-Line link (5.12Gb/s), when compared with conventional IOs. Implemented in 40nm 1P8M standard CMOS, the signal integrity and the BER performance of the proposed CRIOs is comparable to the conventional IOs.
20.2
(7045)
14:25
     |
14:50
A 10Gb/s/pin DQS and WCK Built-Out Tester for LPDDR5 DRAM Test
Chan-Ho Kye1, Jihee Kim2, Kyungmin Baek2, Kahyun Kim2, Sangjin Pack3, Changwon Jung3, and Deog-Kyoon Jeong2
1EPFL, Switzerland
2Seoul National University, Korea
3SK Hynix, Korea

Abstract:
We propose a data strobe (DQS) and write clock (WCK) tester that can replace DFT for the high-speed test of LPDDR5 DRAM.
20.3
(7009)
14:50
     |
15:15
A 7.5Gb/s/pin 12Gb-LPDDR5x SDRAM with a Pseudodouble-bit ECC and “Spider”-shape Datapath Control Architecture in a 2nd Generation 10nm DRAM Process
Feng Lin, Kangling Ji, Enpeng Gao, Zhonglai Liu, Weibing Shang, Hongwen Li
Changxin Memory Technologies, Inc., China

Abstract:
A 12Gb LPDDR5x SDRAM is presented with unique pseudo-double-bit ECC functions. A “Spider”-shape eight-way multiplex is served as central traffic control of high-speed datapaths. A direct dynamic voltage and frequency scaling is proposed to cut down boundary crossing power consumption by 57%. Data receivers with 1-tap DFE is proposed with an on-die eye monitor for margin evaluation. The chip is manufactured using a 2nd generation 10nm DRAM process and achieved 7.5Gb/s/pin data rate under 1.05V.
20.4
(7230)
15:15
     |
15:40
A Single-Ended Duobinary-PAM4(PAM7) Transmitter with a 2-Tap Feed-Forward Equalizer
Jaenam Kim1, 2*, Sanghyeon Park1, 2*, Jaewoo Park1, Junhan Bae1, and Jung-Hoon Chun1, 3
1Sungkyunkwan University, South Korea
2Samsung Electronics, South Korea
3SolidVue, South Korea
*Equally Credited Authors (ECAs)

Abstract:
A PAM4/duobinary-PAM4 dual-mode transmitter is demonstrated in a 28 nm CMOS technology. The duobinary-PAM4 encoder adds two half-rate PAM4 signals driven by quarter-rate clocks and produces 7-level duobinary-PAM4 signals. The proposed transmitter with a 2-tap feed-forward equalizer consists of 48 source-series terminated (SST) driver segments that are partitioned into six blocks to generate a duobinary-PAM4 signal. At 18 Gb/s, the proposed transmitter achieves 1.11-pJ/b and 1.66-pJ/b energy efficiency in duobinary-PAM4 and PAM4 modes, respectively.

Session 21: Application-Oriented ADCs

Session Chair: Chih-Cheng Hsieh, National Tsing Hua University
Session Co-chair: Shuang Zhu, NVIDIA
Date: Nov. 09, 2022 (Wednesday)
Time: 14:00 – 15:40 (UTC+8)
Room: V110 十全軒, VF

ID Time Title / Authors / Affiliation
21.1
(7112)
(Highlight)
14:00
     |
14:25
A 91-dB DR 20-kHz BW 5th-Order Multi-Step Incremental ADC for Sensor Interfaces by Re-Using a MASH 2-1 Modulator
Jia-Sheng Huang1,2, Shih-Che Kuo1, Yu-Cheng Huang1, Chia-WeiKao1,2, Che-Wei Hsu1,3 and Chia-Hung Chen1
1National Yang Ming Chiao Tung University, Taiwan
2Now with Realtek, Taiwan
3Now with Mediatek, Taiwan

Abstract:
A 3rd-order multi-stage incremental ΔΣ ADC (IADC) is proposed to operate in two steps by re-using the same hardware. The first-step is a third-order cascaded IADC for oversampling ratio OSR=24, and then the circuit is reconfigured as a second-order IADC for another OSR=16 for the fine-quantization. The noise-shaping performance is boosted from third- to fifth-order. Prototyped in 0.18 μm technology, the measured DR/SNDR are 91/89 dB and it achieves Schreier FoMs 168.5/166.6 dB for 10 kHz BW.
21.2
(7165)
14:25
     |
14:50
A 78.6 dB-SNDR 520mVpp-full-scale 620MΩ-Zin 105dBCMRR VCO-based Sensor Readout Circuit Using FVFBased Gm-Input Structure
Yi Zhong, Lu Jie, and Nan Sun.
Tsinghua University, China

Abstract:
This paper presents a flipped-voltage-follower (FVF)-based Gm-input CT-ΔΣ ADC with an input impedance enhancement technique. The prototype ADC achieves 78.6dB SNDR with 10 kHz BW at the input range of 480mVpp while consuming 7.1μW, resulting in the Schreier FoM (FoMs) of 170.1dB. This work also achieves 620MΩ input impedance at the chopping frequency of 45kHz and 105dB CMRR.
21.3
(7043)
14:50
     |
15:15
110.1dB DR 4-ch Audio ADCs and 98dB DR 2-ch VoiceTriggering ADCs in Reconfigurable Architecture with Enhanced Off-Transistor-Based Bias Noise Filter
Moo-Yeol Choi, Inhwan Cho, Myungjin Lee, Seunghyun Oh, Jongwoo Lee
Samsung Electronics, Korea

Abstract:
4-ch audio ADCs and 2-ch voice-trigger system ADC with an enhanced off-transistor-based bias noise filter are proposed. The proposed technique addresses the limitations of a voltage drift by well-diode leakage and a reduced equivalent resistance in the previous work of off-transistor-based noise filter. The measured results of audio ADC show 110.1dB DR and -100.1dB THD+N. The CT-DSM in this work achieves the Schreier FoM of 185.7dB in audio ADC mode and 170.6dB in VTS ADC mode and attains the highest DR despite of the additional noise of a capacitive-coupled gain amplifier.
21.4
(7219)
15:15
     |
15:27
A 103.8-dB DR 25ps-to-35ns Resolution Time-to-Digital Converter with Dynamic Ring Oscillator for LiDAR Applications
Taewoong Kim1,2, Sanghoon Lee1, and Youngcheol Chae1
1Yonsei University, Korea
2Now in Samsung Electronics, Korea

Abstract:
This paper proposes a wide dynamic range TDC for LiDAR sensors, the architecture of which is basically a ring oscillator (RO)-based folding TDC and can have different resolutions proportional to the input range by using a dynamically pre-charged supply voltage on a reservoir capacitor. This dynamic RO changes its time resolution from 25 ps to 35 ns. This in turn leads to a significant increase in the dynamic range, resulting in a maximum measurable time of 3.9 μs, which means a distance of 585 m. Implemented in a small area of 0.0135 mm2 with a 28 nm FDSOI process, the prototype TDC achieves a wide dynamic range of 103.8 dB while consuming only 45.6 μW.
21.5
(7209)
15:27
     |
15:40
A 0.3V 762nW-Only Binary-Search Phase ADC With Current-Reused RO-based Comparator
Sifan Wang1, Kejin Li1, Chi-Hang Chan1, Yan Zhu1, Rui Paulo Martins1,2
1University of Macau, China
2On leave Universidade de Lisboa, Portugal

Abstract:
This paper presents a 0.3V 4b binary-search-based phase ADC, running at 1MS/s while only consuming 762nW. Unlike existing techniques with large peripheral circuits and power overhead, the proposed phase ADC keeps simple and consumes purely dynamic power. The linear combiner cascode with the ring-oscillator-based (RO-based) comparator allows current-reused at ultralow voltage. Further incorporated with the proposed binary-search logic for the phase quantization, it realizes an outstanding energy efficiency by reducing the number of comparisons to four in this 4b phase ADC. The phase ADC\'s timing loop is asynchronous, thus maintaining a 1MHz sampling rate under such low voltage.