IEEE A-SSCC 2022 | Regular Sessions

Local Time in Taipei

TAIPEI WEATHER

Regular Sessions

Session 1: Design Techniques for Industrial Applications

Session Chair: Jinwook Oh, Rebellion
Session Co-chair: Po-Hung Lin, National Yang Ming Chiao Tung University
Date: Nov. 07, 2022 (Monday)
Time: 10:50 – 12:30 (UTC+8)
Room: Auditorium 國際會議廳, 10F

ID	Time	Title / Authors / Affiliation
1.1 (7033) (Highlight)	10:50 \| 11:15	Energy Efficient BNN Accelerator using CiM and a TimeInterleaved Hadamard Digital GRNG in 22nm CMOS Richard Dorrance, Deepak Dasalukunte, Hechen Wang, Renzhi Liu, Brent Carlton Intel Corporation, USA Abstract: In this paper, we propose a Bayesian Neural Network (BNN) accelerator leveraging a C-2C SRAM-based analog Compute-in-Memory (CiM) macro for the MAC operations and a variable precision (with programable statistical quality), time-interleaved Hadamard Gaussian Random Number Generator (GRNG) for probabilistic weight generation. The proposed BNN prototype achieve a 25% speedup over the state-of-the-art with a 35× improvement in energy efficiency.
1.2 (7176) (Highlight)	11:15 \| 11:40	Sub-GHz RF Energy Harvester including a Small Loop Antenna Darshan Shetty¹, Christoph Steffan¹, Wolfgang Bösch², Jasmin Grosinger² ¹Infineon Technologies AG, Austria ²Graz University of Technology, Austria Abstract: This work presents a sub-GHz RF energy harvester comprising an RF-DC converter implemented in a 130 nm CMOS technology, a conjugate matched loop antenna, and an output load. The RF-DC converter uses a novel threshold voltage compensation technique, implemented using an inbuilt nanowatt current reference circuit. The threshold compensation design ensures robust system performance across temperature and process corner variations. Measurements of the RF energy harvester including the antenna reveal an excellent 1 V sensitivity of -33 dBm for an output load of 1 GΩ and a peak PCE of 53%.
1.3 (7022) (Highlight)	11:40 \| 12:05	An Attachable Fractional Divider Transforming an Integer-N PLL Into a Fractional-N PLL with SSC Capability Atsushi Motozawa, Yasuyuki Hiraku, Yoshitaka Hirai, Naoaki Hiyama, Yusuke Imanaka, Fukashi Morishita Renesas Electronics Corporation, Japan Abstract: In automotive industry, the system handles with weak satellite signals. Therefore, the output frequency of PLLs is carefully designed to avoid EMI. Recently, GNSS is becoming more common and available frequency bands for clocks are getting narrow. That leads replacement Int-N PLLs with Frac-N PLLs is needed to obtain smaller frequency steps. In this paper, an attachable FDIV is proposed to transform an Int-N PLL into a Frac-N PLL with SSC capability with minimal design effort. A Frac-N PLL with the proposed FDIV achieves -69.3dBc of the worst fractional spur and EMI reduction by 18.7dB in SSC operation.
1.4 (7024) (Highlight)	12:05 \| 12:30	A Learning-Based Algorithm for Early Floorplan With Flexible Blocks ¹JEN-WEI LEE, ¹YI-YING LIAO, ¹TE-WEI CHEN, ¹YU-HSIU LIN, ¹CHIA-WEI CHEN, ¹CHUN-KU TING, ¹SHENG-TAI TSENG, ¹RONALD KUO-HUA HO, ¹HSIN-CHUAN KUO, ¹CHUN-CHIEH WANG, ¹MING-FANG TSAI, ¹CHUN-CHIH YANG, ¹TAI-LAI TUNG, and ²DA-SHAN SHIU ¹MediaTek, Taiwan ²MediaTek Research, Taiwan Abstract: This paper presents a learning-based algorithm using graph neural network (GNN) and deconvolution network to predict the placement of the locations and the aspect ratios for the design blocks with flexible rectangles. With several hours of training on 4 GPUs, the proposed method, targeting at minimizing the cost of wirelength, can generate the placements in early stage of floorplan which is superior to that from the manual placements which requires several days’ efforts for physical design experts.

Session 2: Switching Mode Power Converters

Session Chair: Makoto Takamiya, University of Tokyo
Session Co-chair: Wanyuan Qu, Zhejiang University
Date: Nov. 07, 2022 (Monday)
Time: 14:00 – 15:40 (UTC+8)
Room: Auditorium 國際會議廳, 10F

ID	Time	Title / Authors / Affiliation
2.1 (7081)	14:00 \| 14:25	A Single-inductor Triple-output Buck DC-DC Converter with Electromagnetic Gated Low Dropouts for Higher Resistance to Electromagnetic and Power Side-Channel Attacks with 3B Minimum Traces to Disclosure Improvement in Internet of Things Applications Ya-Ting Hsu¹, Yu-Jheng Ouyang¹, Ke-Horng Chen¹, Kuo-Lin Zheng², Ying-Hsi Lin³, Shian-Ru Lin³, and Tsung-Yen Tsai³ ¹National Yang Ming Chiao Tung University, Taiwan ²Chip-GaN Power Semiconductor Corporation, Taiwan ³Realtek Semiconductor Corp, Taiwan Abstract: The proposed single-inductor triple-output buck converter with electromagnetic gated low dropouts with the advantage of hiding electromagnetic leaked signature. The proposed intelligent true random number generator reduces the peak EMI noise from 88.4dBμV to 54.9dBμV at the fundamental frequency, unobvious tones in fast Fourier transform. Reduction of 33.5dBμV can be derived, improving the minimum traces to disclosure to about 3B.
2.2 (7046)	14:25 \| 14:50	An One-Cycle Load Transient Response and 0.81 mV/A Load-Regulation Time-Domain Cascaded-VCOControlled Buck Converter for Powering Gaming SoC Chieh-Ju Tsai¹, I-Fang Lo², Tsung-Hsien Lin¹, Ching-Jan Chen¹ ¹National Taiwan University, Taiwan ²Richtek Technology Corporation, Taiwan Abstract: A time-domain cascaded-VCO-controlled buck converter with low-cost output LC filter for gaming SoC application is proposed. By separating the modulation and frequency stabilization functions, the KVCO mismatch issue of conventional time-based PWM controller is no longer exists. The steady-state FSW error less than ±0.81% is measured. The proposed controller achieves 0.81mV/A load regulation, 1-cycle load transient settling (1μs), and at least 2X FoM improvement over prior arts.
2.3 (7038)	14:50 \| 15:15	A 90.6% Peak-Efficiency 1.5A Dual Inductor Ladder BuckConverter Achieving 0.93W/mm² Active Peak Power Density for Li-ion Battery Operated PMICs Arindam Mishra, Wei Zhu, and Valentijn De Smedt ESAT, ADVISE, KU Leuven, Belgium Abstract: A dual-inductor-ladder (DIL) DC-DC converter is presented to provide 0.3-1V output down conversion directly from a 2.5-5V Li-ion battery for low-voltage System-on-Chips (SoCs). Inherent inductor current and capacitor voltage balancing, complete capacitive soft-charging, and reduced inductor current facilitate the converter to achieve very high active and passive power-density, and efficiency even for compact-volume inductors. The DIL is fabricated in a 65nm CMOS technology obtaining 90.6% peak efficiency, 0.93W/mm2 active peak power density, and a maximum 1.5A load current support occupying just over 1mm2 die area.
2.4 (7108)	15:15 \| 15:40	A 96.62%-Peak-Efficiency and Seamless-Mode-Transition Buck-Boost DC-DC Converter with Auto-Shift-Ramp Chi-Wei Chen, Bao-Xian Peng, and Hsin-Shu Chen National Taiwan University, Taiwan Abstract: This paper proposes an Auto-Shift-Ramp (ASR) technique, which can significantly alleviate the undershoot or overshoot voltage caused by the mode transition in the multi-mode DC-DC converters. The proposed ASR shifts the starting time of the ramp voltage and empowers the DC-DC converter to change the duty instantly after the mode changes without limiting the maximum duty or changing the modulator gain. According to the measurement results, the mode transition overshoot voltage is less than 16mV or 0.48% with less than 18.98μsec settling time. The converter achieves 96.62%-peak-efficiency at 50mA load current in buck mode. Compared to the prior works, the proposed DC-DC converter with ASR achieves a much lower mode transition voltage than prior works, even with smaller output capacitance.

Session 3: Novel Neural Network and Crypto Processors

Session Chair: Kun-Chih Chen, National Sun Yat-Sen University
Session Co-chair: Leibo Liu, Tsinghua University
Date: Nov. 07, 2022 (Monday)
Time: 14:00 – 15:40 (UTC+8)
Room: Song Bo 松柏廳, 10F

ID	Time	Title / Authors / Affiliation
3.1 (7065) (Highlight)	14:00 \| 14:25	SNPU: Always-on 63.2µW Face Recognition Spike Domain Convolutional Neural Network Processor with Spike Train Decomposition and Shift-and-Accumulation Unit Sangyeob Kim, Sangjin Kim, Soyeon Um, Soyeon Kim, Juhyoung Lee and Hoi-Jun Yoo Korea Advanced Institute of Science and Technology, Korea Abstract: The proposed SNPU has 3 key features. First, Spike Train Decomposition reduces the accumulations (ACCs) by 71.8%. Second, Time Shrinking Multi-Level Encoding replaces the multiple ACCs with single Shift-and-Accumulation (SAC), and SAC unit adopts bit scalability to enable different always-on applications. Third, Neuron Link supports various time-windows to optimize energy consumption by minimizing time-window in layer-by-layer and increases the PE utilization by 14.06% for FR. For LFW dataset, the proposed processing can reduce the energy consumption by 43.9% due to neuron-level event-driven operation. If there is no face in the input, the energy can be reduced further by 87.6%.
3.2 (7042) (Highlight)	14:25 \| 14:50	A 28nm 57.6TOPS/W Attention-based NN Processor with Correlative Computing-in-Memory Ring and Dataflowreshaped Digital-assisted Computing-in-Memory Array Ruiqi Guo¹, Zhiheng Yue¹, Hao Li¹, Te Hu¹, Yabing Wang¹, Hao Sun¹, Jeng-Long Hsu², Yaojun Zhang³, Bonan Yan⁴, Leibo Liu¹, Ru Huang⁴, Shaojun Wei¹, Shouyi Yin¹ ¹Tsinghua University, China ²NeoNexus Pte. Ltd., Singapore ³Pimchip Technology Co., Ltd., China ⁴Peking University, China Abstract: This paper presents a 28nm 7.10mm2 CIM-based transformer processor, achieving 23.81-to-57.6 TOPS/W system energy efficiency. This paper proposes three key design features in the chip: 1) A correlative CIM ring to avoid it to load dynamically generated matrices. 2) A softmax-based speculate unit to eliminate redundant attention computing. 3) A dataflow-reshaped digital-assisted CIM-array to achieve fully pipelined computations of the final attention result. The chip can work at 0.56-to-0.9V, 151-to-202MHz. The chip consumes average power of 57.97mW at 202MHz and 0.9V.
3.3 (7170)	14:50 \| 15:15	A 65nm 8-bit All-Digital Stochastic-Compute-In-Memory Deep Learning Processor Jiyue Yang, Tianmu Li, Wojciech Romaszkan, Puneet Gupta, and Sudhakar Pamarti University of California, Los Angeles, USA Abstract: This work presents the first ADC/DAC-free compute-in-memory accelerator based on Stochastic Computing (SC). A Stochastic-Compute-in-Memory Accelerator (SCIMA) is presented that (1) embeds SC MAC logic inside an SRAM that only requires 1-bit decisions and no DACs/ADCs, (2) reduces SC number generation costs significantly, and 3) employs a computation skipping technique for SC’s average pooling function that reduces the total latency and energy by 4x. The Measured 65nm chip achieves 7.96 TOPS/W energy efficiency for the whole system and 20 TOPS/W for the macro. The solution provides 6x better CIM macro density and 2.5x better peak system energy efficiency of 8-bit precision and network classification accuracy comparable to fixed-point implementations.
3.4 (7188)	15:15 \| 15:40	High-speed and energy-efficient crypto-processor for post-quantum cryptography CRYSTALS-Kyber Taishin Shimada, Makoto Ikeda The University of Tokyo, Japan Abstract: This paper presents the design and measurement results of an ASIC for high-speed, low-power key exchange using CRYSTALS-Kyber, a type of post-quantum cryptography(PQC). The design focuses on a large number of number-theoretic transformations (NTT) in Crystals-Kyber and employs a pipelined architecture to perform the processing. As a result. Our chip performs up to 8.5 times faster than a CPU and consumes 24.1 times less energy than a CPU.

Session 4: RF Transceiver Techniques

Session Chair: Chien-Nan Kuo, National Yang Ming Chiao Tung University
Session Co-chair: Baoyong Chi, Tsinghua University
Date: Nov. 07, 2022 (Monday)
Time: 14:00 – 15:40 (UTC+8)
Room: Chang Chin 長青廳, 10F

ID	Time	Title / Authors / Affiliation
4.1 (7023) (Highlight)	14:00 \| 14:25	A 110-120-GHz, 12.2% Efficiency, 16.2-dBm Output Power Multiplying Outphasing Transmitter in 22-nm FDSOI Jeff Shih-Chieh Chien, James F. Buckwalter University of California, Santa Barbara, USA Abstract: A multiplying outphasing transmitter based on reflection-type phase shifter and multiplier chain is fabricated in Global Foundries 22nm FDSOI CMOS process and the measured transmitter performance achieves 9.2-12.2% DC-to-RF efficiency with 15.1-16.2dBm output power at 110-120 GHz.
4.2 (7027)	14:25 \| 14:50	A D-Band Packaged CMOS Integrated Transmitter for MUMIMO Applications Meng Wei¹, Nima Baniasadi¹, Ethan Chou¹, Hesham Beshary¹, Sashank Krishnamurthy², Elad Alon¹, Ali Niknejad¹ ¹University of California, Berkeley, USA ²Intel, USA Abstract: This paper presents a D-band packaged CMOS integrated transmitter (TX) for Multi-User Multiple-Input-Multiple Output (MIMO) applications. The TX chip, fabricated using 28nm CMOS Bulk process, is packaged on an organic interposer including a patch antenna array. The circuit integrates the complete transmitter chain, including the baseband I/Q amplifiers, up-conversion mixers, power amplifier, and the LO distribution and generation. The designed TX achieves 9-10.6dBm EIRP at Psat , and it can support 24 Gbps 16-QAM and 24Gbps 64-QAM at 5.3pJ/bit efficiency, tested with over-the-air measurements.
4.3 (7050) (Highlight)	14:50 \| 15:15	A Dual-Band 2×2 802.11ax Transceiver Supporting 160MHz CBW and 1024-QAM Chao Lu¹, Shr-Lung (Calvin) Chen², Jun Liu³, Jian Bao³, Yi Zhao³, Chin-Ming Chien², Yufei Wang¹, Jianqiu Chen³, Zexin Liao³, BingDing³, Bihui Zhu³, Jinhua Chen³, Pengfei Yue³, Ran Wang³, and Chun Wang³ ¹ASR Microelectronics Inc., USA ²ASR Microelectronics Inc., USA ³ASR Microelectronics Ltd., China Abstract: A 2×2 802.11ax transceiver design is presented to support dual band simultaneous operation (DBS) and 1024-QAM modulation. The proposed architecture features linearity enhancement for uplink OFDMA and wideband transmission. Best-in-class receiving sensitivity and lowest transmission EVM floor are demonstrated in measurements. With 20MHz (HE20) receiving, -96.5dBm/-66dBm sensitivity level is measured for MSC0/11, respectively. The output power reaches 18dBm with -35dB EVM for 80MHz 1024-QAM (HE80 MCS11) transmission at 5GHz band. Narrowband OFDMA signals can be transmitted at full power capacity, and 160MHz channel bandwidth (CBW) can also be supported without digital predistortion (DPD). The fully integrated transceiver occupies 10.5mm^2 silicon area in 22nm CMOS.
4.4 (7141)	15:15 \| 15:40	A 32.2-38.2 GHz Broadband 4-Channel TRx Beamformer with Embedded 3-Winding Transformer Based PA/LNA FE and High Resolution Phase/Amplitude Control Yongjie Li¹, Zongming Duan¹, Xiao Li¹, Chuanming Zhu¹, Na Ding¹, Yuefei Dai¹, Liguo Sun², Hao Gao³ ¹East China Research Institute of Electronic Engineering, China ²University of Science and Technology of China, China ³Eindhoven University of Technology, the Netherlands Abstract: This paper presents a 32.2-38.2 GHz broadband 4-channel Ka-band transceiver beamformer. In this transceiver (TRx) beamformer front-end (FE), a compact 3-winding-transformer achieves the Tx power combing and Rx noise matching simultaneously in the TDD mode. Furthermore, this 4-channel RF beamformer integrates a high precision 6-bit 360° phase shifter and 6-bit 0.5-dB step gain control in each channel for beam scanning accuracy improvement. With programmable 6-bit phase and 6-bit gain control, at 38 GHz, the measured 31.5-dB gain turning range is also with a 0.5-dB gain step and 5.6° phase step. With the TRx architecture, at 38 GHz, the measured Psat of Tx is 20.0-dBm, and the NF of Rx is 5.55-dB.

Session 5: Biomedical Sensing Chips and Systems

Session Chair: Philex Ming-Yan Fan, National Cheng Kung University
Session Co-chair: Bo Zhao, Zhejiang University
Date: Nov. 07, 2022 (Monday)
Time: 14:00 – 15:40 (UTC+8)
Room: V110 十全軒, VF

ID	Time	Title / Authors / Affiliation
5.1 (7198) (Highlight)	14:00 \| 14:25	A Synchronous-Sampling Impedance-Readout IC with Baseline-Cancellation-Based Two-Step Conversion for Fast Neural Electrical Impedance Tomography Ji-Hoon Suh¹, Haidam Choi¹, Yoontae Jung¹, Sein Oh¹, Hyungjoo Cho¹, Nahmil Koo², Seong Joong Kim², Chisung Bae², Sohmyung Ha³, and Minkyu Je¹ ¹KAIST, Korea ²Samsung Advanced Institute of Technology, Korea ³New York University Abu Dhabi, United Arab Emirates Abstract: It was recently shown that electrical impedance tomography (EIT) with far enhanced frame rate can provide neural activity monitoring and functional localization of the active peripheral nerve at the same time. For the \'fast neural EIT\', we propose an EIT system employing successive-approximation-based (SA-based) baseline tracking and synchronous sampling (SS) of the ADC. By utilizing SA, the baseline can be tracked much faster than conventional incremental tracking. By using SS, only a single cycle of CG is required, enabling fast demodulation and thus allowing the use of low CG frequency. Thanks to these, even with the CG frequency of 18kHz, which is low enough to secure SNR for neural EIT, our work achieves maximum 500 fps which is about 4x higher than the state-of-the-art.
5.2 (7120)	14:25 \| 14:50	A 1984-Pixels, 1.26nW/Pixel Retinal Prosthesis Chip with Time-Domain In-Pixel Image Processing Dong-Hwi Choi and Dong-Woo Jee Ajou University, Korea Abstract: This paper presents 1984-pixel retinal prosthesis (RP) chip with in-pixel image processing. The proposed time-domain image processing circuits perform edge extraction by comparing the pulse widths generated by light-to-stimulus duration converters (LSDCs) of neighboring sensors. The pixel sequencing technique for the shared electrode operation is also proposed to increase the pixel count under the given chip area. The RP chip is implemented in 0.18 μm CMOS process and consumes 1.26 nW/pixel which is ×44.7 better than the previous state-of-the-art
5.3 (7074)	14:50 \| 15:15	A 64-channel back-gate adapted ultra-low-voltage spikeaware neural recording front-end with on-chip lossless/near-lossless compression engine and 3.3V stimulator in 22nm FDSOI Franz Marcus Schüffny, Seyed Mohammad Ali Zeinolabedin, Richard George, Liyuan Guo, Annika Weiße, Johannes Uhlig, Julian Meyer, Andreas Dixius, Stefan Hänzsche, Marc Berthel, Stefan Scholze, Sebastian Höppner, Christian Mayr TU Dresden, Germany Abstract: In neural implants and biohybrid research systems, the integration of electrode recording and stimulation front-ends with pre-processing circuitry promises a drastic increase in real-time capabilities. In our proposed neural recording system, constant sampling with a bandwidth of 9.8kHz yields 6.73µV input-referred noise (IRN) at a power-per-channel of 0.34µW for the time-continuous ΔΣ-modulator, and 0.52µW for the digital filters and spike detectors. We introduce dynamic current/bandwidth selection at the ΔΣ and digital filter to reduce recording bandwidth at the absence of spikes. This is controlled by a two-level spike detection and adjusted by adaptive threshold estimation (ATE). Dynamic bandwidth selection reduces power by 53.7%, increasing the available channel count at a low heat dissipation. Adaptive back-gate voltage tuning (ABGVT) compensates for PVT variation in subthreshold circuits. This allows 1.8V input/output (IO) devices to operate at 0.4V supply voltage robustly. The proposed 64-channel neural recording system moreover includes a 16-channel adaptive compression engine (ACE) and an 8-channel on-chip current stimulator at 3.3V.
5.4 (7123)	15:15 \| 15:40	A Heart-related Physiological Signal Monitoring SoC for Wearable ECG Analysis Systems Peng-Wei Huang ¹, Shuenn-Yuh Lee¹, Chieh Tsou¹, Yi-Wen Hung¹, Po-Han Su¹, Ju-Yi Chen² ¹National Cheng Kung University, Taiwan ²National Cheng Kung University Hospital, Taiwan Abstract: This proposed configurable electrocardiogram (ECG) analysis system-on-chip (CEASoC) allows ECG monitoring and complex QRS detection and classification, thereby reducing the manpower requirements of the analysis. ECG analyses conducted by a person are effort- and time-consuming. Thus, an automatic ECG analysis device with a CEASoC and BLE module is necessary. This device can improve the healthcare environment through the convenience of instant detection. The burden of long-term care can then be relieved. Moreover, considering individual differences, the important analysis parameters in CEASoC can be updated using external devices and software to enhance the flexibility of the proposed system.

Session 6: High-Speed and Time-Interleaved ADCs

Session Chair: Hsin-Shu Chen, National Taiwan University
Session Co-chair: Yong Lim, Samsung Electrnics
Date: Nov. 08, 2022 (Tuesday)
Time: 10:50 – 12:30 (UTC+8)
Room: Auditorium 國際會議廳, 10F

ID	Time	Title / Authors / Affiliation
6.1 (7053) (Highlight)	10:50 \| 11:15	A Single-Channel 14b 500 MS/s Pipelined-SAR ADC with Reference Ripple Mitigation Techniques and AdaptiveBiased Floating Inverter Amplifier ^1,2Wenning Jiang, ¹Yan Zhu, ¹Chi-hang Chan, and ^1,3Rui Martins ¹University of Macau, China ²Fudan University, Shanghai, China ³Universidade de Lisboa, Portugal Abstract: This paper presents a 14b 500MS/s single-channel pipelined-SAR ADC. An on-chip reference buffer is codesigned with reference ripple neutralization (RRN) and cancellation (RRC) in the first stage to facilitate a fast conversion at low power. An adaptive-biased floating inverter amplifier (AB-FIA) is introduced to enhance the gain, linearity and speed. Consuming 6.34mW (included reference buffer), the achieved SNDR and SFDR are 64.2dB and 80.55dB at Nyquist input, respectively. The ADC achieves 170.2dB Schreier FoM and 9.6 fJ/conversion-step Walden FoM at Nyquist input.
6.2 (7190)	11:15 \| 11:40	A 3.07mW 30MHz-BW 73.5dB-SNDR Time-Interleaved Noise-Shaping SAR ADC with 2nd -order ErrorFeedforward and Redundancy-Bit Reduction Shulin Zhao¹, Mingqiang Guo¹, Sai-Weng Sin^1,2, Liang Qi³, Dengke Xu⁴, Guoxing Wang³, Rui P. Martins^1,5 ¹University of Macau, China ²Zhuhai UM Science & Technology Research Institute, China ³Shanghai Jiao Tong University, China ⁴Amicro Semiconductor Co., Ltd, China ⁵University of Lisboa, Portugal Abstract: This work presents a calibration-free 2-channel time-interleaved noise-shaping SAR (TI-NS-SAR) with 1) one-time midway error-FB and a shared dynamic amplifier to reduce the redundancy bit; 2) the 2nd-order error-feedforward to enhance NS effect for higher resolution. Fabricated in 28nm CMOS, the prototype achieves 73.5dB-SNDR and 30MHz-BW with a sampling frequency of 330MHz. It consumes 3.07mW, resulting in an FoMs of 173.4dB.
6.3 (7095)	11:40 \| 12:05	A 12b 8GS/s Time-Interleaved 2b/cycle Pipelined-SAR ADC with Layout-Customized Bootstrap and SuperSource-Follower Based Open-Loop Residue Amplifier Qiang Yu^1,2, Jie Pu¹, Jian Luo¹, Zhengbo Huang¹, Junhong Wu¹, Xing Zhu¹, Feixiang Xiang¹, Lei Chen¹, Jianwen Li¹, Qiang Li², Jinda Yang¹, and Yuanjun Cen¹ ¹Chengdu Sino Microelectronics Technology, China ²University of Electronic Science and Technology of China, China Abstract: This work describes a 12b 8GS/s time-interleaved ADC which utilizes a 2b/cycle pipelined-SAR ADC in each channel to enhance the speed while maintaining low power. To sample the input signal within 125ps, a layout-customized bootstrap is proposed to accelerate the start-up time. A high-linearity super-source-follower (SSF) based open-loop residue amplifier (RA) with large input swing and strong output power is exploited. With Nyquist input, this 8GS/s ADC achieves a SNDR of 53.8dB and a SFDR of 67dB with a power dissipation of 1W.
6.4 (7225)	12:05 \| 12:17	A 6-bit 5.12-GS/s Flash ADC with Track-and-Hold Embedded Dynamic Preamplifier in 28nm CMOS Daesik Moon^1,2, Sangwoo Lee³, Taewoong Kim¹, Woo-Young Choi¹, and Youngcheol Chae¹ ¹Yonsei University, Korea ²Samsung Electronics, Korea ³Robert Bosch LLC., USA Abstract: 5.12gs/s flash adc with track-and-hold embedded dynamic preamplifier. x4 interpolated pipelined amplifier followed by strong-arm latch. above 32.96db over different input frequencies and sampling frequencies. foreground calibration is realized.
6.5 (7213)	12:17 \| 12:30	A 7-Bit 4-GS/s Quad-Channel Time-Interleaved SAR ADC With 2-Then-1-Bit/Cycle Conversion Jihyun Baek, Jonghyun Kim, Gyuchan Cho, Jintae Kim, and Hyungil Chae Konkuk University, Korea Abstract: A 7-bit 4-GS/s quad-channel TI-SAR ADC including the front-end sampler and the buffer is presented. The channel ADC speed is maximized by 2-then-1-bit/cycle coarse-fine conversion without calibration. Also, a buffer topology for unity gain is introduced. The prototype is implemented in a 28-nm CMOS process, and it shows an SNDR of 38 dB at a 4 GS/s sampling rate. The power consumption is 11.4 mW, and the Walden FoM is 43.8 fJ/conv.-s showing good energy efficiency.

Session 7: Emerging Computing Applications on FPGA

Session Chair: Chuen-Yau Chen, National University of Kaohsiung
Session Co-chair: Youngjoo Lee, POSTECH
Date: Nov. 08, 2022 (Tuesday)
Time: 10:50 – 12:30 (UTC+8)
Room: Song Bo 松柏廳, 10F

ID	Time	Title / Authors / Affiliation
7.1 (7254) (Highlight)	10:50 \| 11:15	A 75.6M Base-pairs/s FPGA Accelerator for FM-index Based Paired-end Short-Read Mapping Chung-Hsuan Yang¹, Yi-Chung Wu¹, Yen-Lung Chen¹, Chao-Hsi Lee², Jui-Hung Hung^2,3, Chia-Hsiang Yang^1,2 ¹National Taiwan University, Taiwan ²GeneASIC Technologies Corp., Taiwan ³National Yang Ming Chiao Tung University, Taiwan Abstract: This work presents an FPGA accelerator for FM-index based paired-end short-read mapping in NGS data analysis realized on a AMD-Xilinx Alveo U250 FPGA board. With the proposed design techniques, the overall latency is reduced by 92.6%. This work delivers a 1.7-18.6x higher throughput with memory-efficient implementation and achieves the highest 99.3% accuracy, when compared to the state-of-the-art FPGA-based designs. On-site FPGA demonstration will be made.
7.2 (7152)	11:15 \| 11:40	A 217.8 MSOPs/W FPGA-based Online Learning SNN Processor Using Unified Event-Driven Structure and Topology Aware Data Reuse Strategies Chaoming Fang^1,2, Fengshi Tian², Chuanqing Wang², Jie Yang², Mohamad Sawan² ¹Zhejiang University, China ²CenBRAIN Neurotech, Westlake University, China Abstract: We present in this paper a reconfigurable algorithmic neuromorphic engine (RAINE) with three innovative features: 1) A Pipelined-Event-Driven (PED) architecture to increase SNN execution efficiency by leveraging input sparsity. 2) A Topology-Adaptive-Stationary (TAS) data reuse strategy to reduce memory access by adopting Voltage-Reuse (VR), Event-Reuse (ES), and Synapse-Reuse (SR) dataflow for different topologies and 3) A Unified-Dynamic-Learning-Engine (UDLE) to carry out computation for both Leaky-Integrate-Fire (LIF) and trace-based Spike-Timing-Dependent-Plasticity (STDP) online learning. RAINE shows competitive energy efficiency of 217.8 MSOPS/W at a clock frequency of 75MHz, without causing additional hardware resource overhead due to the compact and unified circuit design.
7.3 (7187)	11:40 \| 12:05	A Flexible Instruction-based Post-quantum Cryptographic Processor with Modulus Reconfigurable Arithmetic Unit for Module LWR&E; Aobo Li, Dongsheng Liu, Xiang Li, Tianze Huang, Shuo Yang, Jiahao Lu, Ang Hu Huazhong University of Science and Technology, China Abstract: In this work, we proposed a reconfigurable arithmetic unit with variable modulus domain, and combined with custom instruction-set architecture to design a flexible crypto processor for MLWR and MLWE. Verified on the FPGA platform, the work achieved the flexible implementation of variable parameters and instruction programming under the strategy of resource efficiency and performance trade-off.
7.4 (7100)	12:05 \| 12:30	Method of Halved Interaction Elements with Regularity Arrangement that achieves Independent Double Systems for Scalable Fully Coupled Annealing Processing Shinjiro Kitahara, Akari Endo, Taichi Megumi, and Takayuki Kawahara Tokyo University of Science, Katsushika, Japan Abstract: In recent years, annealing processors have been developed as solutions to large-scale combinatorial optimization problems. In this paper, we propose a new method that has a high affinity with a scalable fully coupled annealing processor and halves the interaction in which there are squares of spins with sequence regularity. In addition, we succeeded in implementing two independent 384-spin fully-coupled Ising machines with 16 chips. The usefulness of the reduction plan is shown.

Session 8: High Performance Receiver and Detection Techniques

Session Chair: Kuang-Wei Cheng, National Cheng Kung University
Session Co-chair: Dixian Zhao, Southeast University
Date: Nov. 08, 2022 (Tuesday)
Time: 10:50 – 12:30 (UTC+8)
Room: Chang Chin 長青廳, 10F

ID	Time	Title / Authors / Affiliation
8.1 (7214) (Highlight)	10:50 \| 11:15	A 37-39GHz Phase and Amplitude Detection Circuit with 0.060 degree and 0.043dB RMS Errors for the Calibration of 5GNR Phased-Array Beamforming Yudai Yamazaki, Jun Sakamaki, Jian Pang, Joshua Alvin, Zheng Li, Atsushi Shirane, Kenichi Okada Tokyo Institute of Technology, Japan Abstract: Phased-array beamforming is achieved by the high-resolution phase and amplitude controls in each TRX element. However, the on-chip mismatches caused by PVT variations between each element degrades the phased-array performance. In this work, a phase and amplitude high-accuracy detection circuit for phased-array mismatch calibration in 39GHz bands is introduced. Phase-to-digital converter (PDC) and analog-to-digital converter (ADC) detection technique is applied for much lower detection errors than conventional. The proposed detection circuit achieves phase and amplitude detection in 37-39GHz with 0.046 degree and 0.043dB RMS errors, respectively. The core area is 1.34mm^2, which is fabricated in a 65nm CMOS process.
8.2 (7074)	11:15 \| 11:40	A 0.55mm² 16.9mW Fully Integrated 0-to-200MHz System BW Wireless Direct Sampling Receiver in 14nm FinFET Ilhoon Jang, Barosaim Sung, Jaehoon Lee, Soonwoo Choi, Byoungjoong Kang, Suseob Ahn, Kyungmin Lee, Taejin Jang, Kwangmin Lim, Anna Yu, Yong Lim, Seunghyun Oh, and Jongwoo Lee Samsung Electronics, Korea Abstract: This paper presents a fully-integrated wireless direct sampling receiver that covers from DC to 200MHz system bandwidth implemented with a single-channel SAR ADC in 14nm FinFET. To demonstrate the proposed architecture, frequency modulation (FM) among the applicable standard frequency bands is adopted as a prototype. The measured demodulated SNR is 73.9dB with -47dBm input power at 108MHz and the sensitivity level is -106dBm. The proposed direct sampling receiver shows a robust performance over a 30dB demodulated SNR even in the presence of the interference such as a strong adjacent channel and an in-band spur. Furthermore, the FM channel scan time is drastically reduced since the proposed receiver simultaneously samples all channels without adjusting analog building blocks.
8.3 (7124)	11:40 \| 12:05	An n79 Sub-1-dB Noise Figure Highly Linear VariableGain LNA Employing Adaptive Imbalanced Bleeding for 5G NR Jinglong Xu¹, Keun-Mok Kim¹, Hafiz Usman Mahmood¹, Jusung Kim², Sang-Gug Lee¹ ¹KAIST, Korea ²Hanbat National University, Korea Abstract: This work presents a 5G n79 sub-1-dB NF highly-linear variable-gain LNA. Three key techniques are introduced: (i) Imbalanced current bleeding for a wide gain range, (ii) Drain-side DC current switching for low power operation (iii) bleeding with an adaptive biasing scheme for linearity improvement. The proposed LNA shows a peak gain of 20.5 dB with a 0.74 dB minimum NF, with a wide gain range of 13.4 dB while reducing the power to 4.2 mW at the lowest gain mode. As a result, the proposed LNA achieves the best FoM1 among reported LNAs working at 4-6 GHz.
8.4 (7232)	12:05 \| 12:30	A 24GHz CMOS UWB Radar IC with IQ Correlation Receiver for Short Range Human Detection Dongwuk Park^1,2, Byeongjae Seo¹, Kiryun Byeon¹, Gu Jung², andYunseong Eo^1,2 ¹Kwangwoon University, Korea ²Silicon R&D;, Corp., Korea Abstract: A fully integrated 24 GHz UWB radar IC is presented. The IQ correlation receiver is employed for the detection fidelity and range extension. The transmitter is a VCO based impulse generator. The carrier frequency and bandwidth of the UWB signal can be tunable in the range of 22.9 - 25.5 GHz and 0.18 - 3 GHz, respectively. The equivalent sample resolution is 195 ps. The radar module using IC provides the maximum detection range for moving human up to 12.5 m within 120.27 mW power consumption.

Session 9: Energy-Efficient Digital Circuit Techniques

Session Chair: Amit Agarwal, Intel
Session Co-chair: Chia-Hsiang Yang, National Taiwan University
Date: Nov. 08, 2022 (Tuesday)
Time: 10:50 – 12:30 (UTC+8)
Room: V110 十全軒, VF

ID	Time	Title / Authors / Affiliation
9.1 (7154) (Highlight)	10:50 \| 11:15	DSC-TRCP: Dynamically Self-calibrating Tunable Replica Critical Paths Timing Monitoring for Variation Resilient Circuits with Low Cost & Large Power/Frequency Gain Zhengguo Shen, Weiwei Shan, Yuxuan Du, Ziyu Li, Chengjun Wu, Jun Yang* Southeast University, China Abstract: In-situ timing monitoring based adaptive voltage scaling (AVS) eliminates the excess timing margin for digital circuits but suffers from miss detection risk. Indirect monitoring methods face difficulties in the calibration of the replica circuit and its discrepancy with the actual circuit which limits its gain. We propose a dynamically self-calibrating tunable replica critical paths (DSC-TRCP) based timing monitoring method, which integrates the advantages of both in-situ and indirect monitoring methods while conquering their disadvantages. Implemented in a 28nm CMOS technology, it achieves up to 58% power gain or 232% frequency gain with only 0.65% area cost.
9.2 (7208)	11:15 \| 11:40	C³MLS: A 0.12-nW Leakage and 18.11-fJ/Transition Level Shifter With Cross-Coupled and Current Mirror Hybrid Structure for Ultra-Wide Range Level Conversions Cong Huang and Hailong Jiao* Peking University, China Abstract: In this paper, a CCLS (cross-coupled level shifter)/CMLS (current mirror level shifter) hybrid level shifter, C3MLS, is proposed for ultra-wide range level conversions from extremely low voltage deep in the subthreshold region to nominal supply voltage. By maintaining the merits of CCLS and CMLS and utilizing them to kill the drawbacks of each other, the proposed C3MLS achieves limited-current-contention and nearly static-current-free conversions. Measurement results in 55-nm technology demonstrate that the proposed level shifter exhibits the lowest energy-delay product among the state of the art and an average static power consumption of 0.12 nW @ VDDL = 0.3 V.
9.3 (7117)	11:40 \| 12:05	A 0.0043-mm² Capacitorless External-Clock-Free FullySynthesizable Digital LDO Using Load-Direct Droop Detector and Time-Based Load-State Decision Jonghyun Oh¹, Yoonho Song², Young-Ha Hwang³, Jun-Eun Park⁴, Mingoo Seok¹, and Deog-Kyoon Jeong² ¹Columbia University, USA ²Seoul National University, Korea ³Soongsil University, Korea ⁴Chungnam National University, Korea Abstract: The proposed fully-synthesizable DLDO determines a load state using a single CMP, a single voltage reference, and a tunable delay line without an external clock, resulting in having an 99.6% current efficiency in a 0.6-V supply voltage. Besides, a 5-ns settling time from a 98-mV voltage droop is achieved using a coarse controller and a load-direct droop detector. The DLDO offers a 0.0043-mm2 chip area and 13.01-A/mm2 current density thanks to the fully-synthesized capacitorless design. The DLDO exhibits the best FoM2 compared with prior arts that includes a performance for settling time.
9.4 (7168)	12:05 \| 12:17	A 10-Gbps, 0.121-pJ/bit, All-Digital True Random-Number Generator using Middle Square Method Jonghyun Kim and Hyungil Chae Konkuk University, Korea Abstract: A robust and all-digital true random number generator (TRNG) with high throughput and good power efficiency is presented. A modified middle square method for post- processing converts a 1-bit comparator output to an 8-bit random stream to achieve 10Gbps throughput. The proposed TRNG achieves the highest throughput as well as the best power efficiency of 0.121pJ/bit among all NIST test-suite adaptable TRNGs.
9.5 (7157)	12:17 \| 12:30	A Variation-Tolerant Differential Contention-Free Pulsed Latch with Wide Voltage Scalability Gicheol Shin, Minhyeok Jeong, Donguk Seo, Shin Han, Yoonmyung Lee Sungkyunkwan University, Korea Abstract: A differential contention-free pulsed latch (DCPL) is proposed, targeting wide voltage range scalability (1V to 0.4V). In order to operate in near threshold-voltage (NTV) region, differential latch structure is combined with dynamic XOR while staying static and contention-free, using special header/bridge structure. Also, in order to decrease the number of transistors and power consumption, pulse generator is absorbed into D-latch using blockages controlled by delayed clock and dual bridge structure. The proposed DCPL operates as reliably as TGFF at NTV region, and shows 50% improvement in sequencing time compared to TGFF, while maintaining similar hold time compared to prior-arts pulsed latches.

Session 10: Analog Techniques

Session Chair: Tetsuya Hirose, Osaka University
Session Co-chair: Mustafijur Rahman, Indian Institute of Technology Delhi
Date: Nov. 08, 2022 (Tuesday)
Time: 14:00 – 15:40 (UTC+8)
Room: Auditorium 國際會議廳, 10F

ID	Time	Title / Authors / Affiliation
10.1 (7132) (Highlight)	14:00 \| 14:25	A Process-Scalable Ultra-Low-Voltage 180kHz Sleep Timer with a Time-Domain Amplifier and a Switch-less Resistance Multiplier Chongsoo Jung¹, Hoyong Seong¹, Injun Choi¹, Sohmyung Ha², and Minkyu Je¹ ¹KAIST, Korea ²New York University Abu Dhabi, United Arab Emirates Abstract: This paper presents a process-scalable on-chip sleep timer. Our sleep timer overcomes the limitations of conventional on-chip sleep timers by using a combination of ultra-low-voltage (ULV) frequency-locked-loop (FLL) architecture, and a time-domain amplifier (TDA), and a gate-leakage-leveraging technique. The proposed design, fabricated in a 65nm CMOS, produces a 180kHz frequency and achieves 2.73ppm/°C temperature dependency with calibration based on a lookup table (LUT) while consuming 61nW at 0.4V supply.
10.2 (7068)	14:25 \| 14:50	A sub-0.5V Crystal Oscillator-Timer (XO-Timer) Combining 16MHz Reference and 32kHz Sleep Timer with a Single Crystal for Energy-Harvesting Radios in 28nm CMOS Liwen Lin¹, Ka-Meng Lei¹, Pui-In Mak¹, Rui P.Martins^1,2 ¹University of Macau, China ²Universidade de Lisboa, Portugal Abstract: This paper reports an ultra-low-voltage (ULV) single-crystal oscillator-timer (XO-Timer) for sub-0.5 V BLE radios. Specifically, we propose a cascaded charge-pump (CP) as the micropower manager (μPM) to customize the voltage and current budgets for each XO-Timer sub-function. Such μPM shows a higher power efficiency than the non-cascaded design and features a single voltage-regulation loop to uphold the performance of the XO-Timer against VT-variations. The XO-Timer\'s core amplifier innovates an ULV reconfigurable-gm topology to balance the power budget and performance under the high-performance mode (HPM) and low-power mode (LPM). Fabricated in 28-nm CMOS, the XO-Timer in HPM generates a 16-MHz clock with a power of 24.3 μW, and a phase noise of −133.8 dBc/Hz at 1-kHz offset. In the LPM, a 32.258-kHz clock is delivered while consuming 11.4 μW. The sleep-timer FoM2 is 14.8 μW and the Allan deviation is 35.1 ppb, achieving the lowest supply voltage (0.25 V) not only for a dual-mode XO-Timer but also for a MHz-range XO.
10.3 (7077)	14:50 \| 15:15	A 0.63-mm²/Ch 1.3-mΩ/√Hz-Sensitivity 1-MHz Bandwidth Active Electrode Electrical Impedance Tomography System Ting Zhou, Hui Li, Jiajie Huang, Chao Wang, Qianyu Guo, Junyan Liu, Zhiwen Gu, Yang Zhao, Jian Zhao, Mingyi Chen, Yan Liu, Guoxing Wang, Yong Lian, Yongfu Li* Shanghai Jiao Tong University, China Abstract: AE-EIT 2D system is presented using 1) direct IF down-conversion, and digitally switched SRDP I/Q demodulation technique with low power circuit techniques to improve the impedance resolution to 1.3mΩ/√Hz at 100kHz and reduce the variation of readout circuit 0.44mVpp (4.44×) while achieving the smallest area per channel of 0.63mm2 (1.38×-6.6×).
10.4 (7189)	15:15 \| 15:40	A 1.7-6.4 GHz fourth-order RF filter with 1-40% fractional bandwidth in 22-nm FDSOI Iman Ghotbi, Baktash Behmanesh, and Markus Törmänen Lund University, Sweden Abstract: This paper presents a fourth-order Q-enhanced RF filter featuring gm-boosting, noise-canceling, capacitive cross-coupling, and forward body-biasing techniques to realize 1.7 to 6.4 GHz operating range and up to 40% adjustable fractional bandwidth. The filter operates based on subtracting out-of-phase signals in the passband and in-phase signals in the stopband. Two Q-enhanced LC resonators are utilized for outphasing. Fabricated in 22 nm FDSOI, the chip achieves 4.6 dB NF, -14 dBm IB-IIP3, and 26 dBm IB-IIP2 at 4 GHz while drawing 22-45 mA from a 1 V supply. Fourth-order steep roll-off results in 17 dBm OOB-IIP3 at 2×BW frequency offset.

Session 11: Computing & Processing in Memory

Session Chair: Shyh-Shyuan Sheu, Industrial Technology Research Institute
Session Co-chair: Juang-Ying Chueh, Etron Technology Inc
Date: Nov. 08, 2022 (Tuesday)
Time: 14:00 – 15:40 (UTC+8)
Room: Song Bo 松柏廳, 10F

ID	Time	Title / Authors / Affiliation
11.1 (7061) (Highlight)	14:00 \| 14:25	A 28nm Hybrid 2T1R RRAM Computing-in-Memory Macro for Energy-efficient AI Edge Inference Wang Ye^1,3, Chunmeng Dou^1,3, Linfang Wang^1,3, Zhidao Zhou^1,3, Junjie An¹, Weizeng Li^1,3, Hanghang Gao^1,3, Xiaoxin Xu^1,3, Jinshan Yue¹, Jianguo Yang^1,3, Jing Liu^1,3, Dashan Shang^1,3, Jinghui Tian², Qi Liu^1,2, Ming Liu^1,2 ¹Institute of Microelectronics of the Chinese Academy of Sciences, China ²Fudan University, China ³University of Chinese Academy of Sciences, China Abstract: This work presents the first 28nm hybrid 2T1R (H2T1R) RRAM computing-in-memory macro for AI edge inference. It features (1) the H2T1R cell array that can achieve >13X enhanced resistance-ratio, >80% reduced summation current, >67% smaller word-line voltage, and precise multi-bit weight encoding, and (2) reference-subtracting current sense amplifier (RS-CSA) that can reduce the number of the stand-by reference signals and extend the linear dynamic range of the current mirror. It performs highly accurate multi-bit analogue computation over 32 input channels with a peak energy efficiency up to 154.04 TOPS/W.
11.2 (7167) (Highlight)	14:25 \| 14:50	A Local Transpose 9T SRAM Compute-In-Memory Macro with Programmable Single-Slope SAR ADC Xin Zhang^1, Yongjun Jo^1, Jiahao Liu², Jun Zhou², Yuanjin Zheng¹, and Tony Tae-Hyoung Kim¹ (Equally contributed authors)* ¹Nanyang Technological University, Singapore ²University of Electronic Science and Technology of China, China Abstract: This work proposes a two-directional transpose SRAM compute-in-memory (CIM) macro for inference and training in convolutional neural networks (CNN). A novel 9T SRAM bit-cell is proposed for local two-way computing without additional shared transpose processing units. The proposed transposable CIM achieves higher processing throughput from every bit-cell being able to operate at the same time in one CIM computing cycle. This work also proposes a programmable single-slope (SS) successive approximation (SAR) ADC for energy efficiency improvement by utilizing the probability density function of MAC values. The proposed ADC also supports the ReLu-based zero skip function by the SS operation. The test chip was fabricated by 180nm CMOS technology and achieved an energy efficiency of 6.61TOPS/W with the ADC zero-skip and SS operations.
11.3 (7203)	14:50 \| 15:15	Spike-CIM: A 290TOPS/W Spike-Encoding SparsityAdaptive Computing-in-Memory Macro with Differential Charge-Domain Integrate-and-Fire Jiahao Song¹, Xiyuan Tang¹, Haoyang Luo¹, Kuan Xu², Yuan Wang¹, Zhigang Ji², Runsheng Wang¹, and Ru Huang¹ ¹Peking University, China ²Shanghai Jiao Tong University, China Abstract: This paper proposes a spike-encoding sparsity-adaptive computing-in-memory (CIM) macro (Spike-CIM) that offers excellent energy efficiency and robustness. A differential integrate-and-fire architecture, implemented by charge-domain cells, is proposed to achieve sparsity-adaptive power saving. The fabricated 65nm 32Kb Spike-CIM realizes a normalized energy efficiency of 1218 TOPS/W/Bit.
11.4 (7245)	15:15 \| 15:27	A Hybrid Temperature Compensation method combined with Digital and Analog Temperature Compensation Techniques for 3D-NAND Flash Memories Dojeon Lee, Junhong Park, Philkyu Kang, Sungmin Jo, Seheon Baek, Chi-Weon Yoon, Dongku Kang Samsung Electronics, Korea Abstract: The voltage compensation methods according to the temperature change can be typically divided into a digital method and an analog method. This paper proposes the hybrid temperature compensation method that combines the advantages of the Digital method and the Analog method to secure temperature linearity and reduce time overhead for temperature sensing.
11.5 (7160)	15:27 \| 15:40	A Variation-Tolerant Processing-In-Memory Architecture Using Discharging Current Calibration Daiki Kitagata, Shinji Tanaka, Naoya Fujita and Naoaki Irie Renesas Electronics Corporation, Japan Abstract: This paper presents a variation-tolerant ternary neural arithmetic memory (VT-TNAM) for energy-efficient processing-in-memory (PIM) accelerators. The VT-TNAM macro installs the newly proposed discharging current calibration (DCC) architecture using adjustable-current ternary bit cells (ACTBCs) to effectively mitigate local process variation. Furthermore, hierarchical MAC-operation skipping (HMS) architecture using the proposed small current detector (SCD) is also developed to compensate for energy efficiency degradation caused by MAC accuracy improvement. Successful reduction of process variation is verified using a fabricated test-element-group (TEG) in 22nm process and 20.0 – 59.2 TOPS/W is achieved by introducing the HMS architecture.

Session 12: Advanced Wireline Transceiver Techniques

Session Chair: Wei-Zen Chen, National Yang Ming Chiao Tung University
Session Co-chair: Jung-Hoon Chun, Sungkyungkwan University
Date: Nov. 08, 2022 (Tuesday)
Time: 14:00 – 15:40 (UTC+8)
Room: Chang Chin 長青廳, 10F

ID	Time	Title / Authors / Affiliation
12.1 (7051) (Highlight)	14:00 \| 14:25	A 103 fJ/b/dB, 10-26 Gbps Receiver with a Dual Feedback Nested Loop CDR for Wide Bandwidth Jitter Tolerance Enhancement Yao-Chia Liu¹, Wei-Zen Chen¹, Yuan-Sheng Lee², Yu-Hsiang Chen², Shawn Min², Ying-Hsi Lin² ¹National Yang Ming Chiao Tung University, Taiwan ²Realtek Semiconductor Corp., Taiwan Abstract: A Nested CDR based Receiver with PI controller is presented. The direct modulation jumps over the loop latency limited PI path and modulate VCO for faster response and enhance the stability. The measured jitter tolerance curve shows 0.15UI enhancement at 60MHz, while DFE is simplified by edge based algorithm the receiver is able to tolerate 32dB channel loss. For CDR only prior art , this work improves twice more than traditional PI architecture and four times more than DCO architecture in term of power efficiency.
12.2 (7026) (Highlight)	14:25 \| 14:50	A 42Gb/s PAM-8 Transmitter with Feed-Forward Tomlinson-Harashima Precoding in 28nm CMOS Byungjun Kang, Woosong Jung, Hyojun Kim, Sanghee Lee, and Deog-Kyoon Jeong Seoul National University, Korea Abstract: A 42Gb/s PAM-8 transmitter (TX) with feed-forward Tomlinson-Harashima precoding (FF-THP) is presented. The FF-THP architecture produces a uniform output distribution with higher average signal power compared with the FFE. The fabricated chip compensates for the 7.7dB channel loss with the PAM-8 signaling. As a result, it achieves the power efficiency of 1.58pJ/b, occupying 0.0703mm2.
12.3 (7228)	14:50 \| 15:15	A 11.4-Gbps/lane MIPI 32-bit C-PHY and D-PHY combo transmitter with 3-tap FFE Junhan Bae¹, Myeongkyu Song¹, Bongkyu Kim¹, Junkyu Lee¹, Woosung Park^1,2, and Jung-Hoon Chun^1,3 ¹Sungkyunkwan University, Korea ²Samsung Electronics, Korea ³SolidVue, Korea Abstract: This paper describes a MIPI C/D-PHY combo transmitter (TX) fabricated in 110nm CMOS image sensor (CIS) process. The same hardware can be shared to support both C-PHY and D-PHY with little extra circuitry. The adopted 32-bit architecture that enables double data rate (DDR) in C/D-PHY can maximize the data rate, allowing it to exceed the limits of legacy sub-micron process technologies. In addition, the proposed TX utilizes 3-tap feed-forward equalization (FFE) in both the C-PHY and D-PHY modes, effectively eliminating the inter-symbol interference (ISI) induced by band-limited channels. The measured results indicate that the compliance test verified in C-PHY mode is comfortably passed at data rates up to 11.4 Gbps (5 Gsps) per lane. The eye diagrams in D-PHY mode are fully open at the data rates up to 6 Gbps per lane.
12.4 (7226)	15:15 \| 15:40	A 5.0-to-12.5-Gb/s, 1.7-pJ/b, 0.66-µs Lock-time Referenceless Sub-sampling CDR with Beat Detection FLL in 28nm CMOS Woosung Park^{1, 2}, Jahoon Jin², Minsu Park¹, Sangdon Jung^{1, 2}, and Jung-Hoon Chun^{1, 3} ¹Sungkyunkwan University, South Korea ²Samsung Electronics, South Korea ³SolidVue, South Korea Abstract: The capture range of the SSPD is wider than that of the PD, relieving the burden of reducing the residual frequency. In practice, the SSPD-based CDR (SSCDR) in [1] corrects frequency errors without an FLL, saving significant power. The SSCDR also achieves short lock-time with a wide bandwidth; therefore, it is suitable for the burst-mode operation which requires a sub-ns relocking time. To take advantage of these desirable characteristics of the SSCDR, this work benchmarks [1] and extends the frequency coverage by employing the beat detection FLL. The proposed FLL shows faster-locking behavior than prior arts through a beat correction process using the down-conversion function of the SSPD. As a result, the proposed FLL relieves a trade-off between lock time and frequency coverage. We also propose a bandwidth-control technique and an energy-efficient dual-mode SSPD.

Session 13: Communication and Powering Techniques for Biomedical Applications

Session Chair: Youngcheol Chae, Yonsei University
Session Co-chair: Inhee Lee, University of Pittsburgh
Date: Nov. 08, 2022 (Tuesday)
Time: 14:00 – 15:40 (UTC+8)
Room: V110 十全軒, VF

ID	Time	Title / Authors / Affiliation
13.1 (7025) (Highlight)	14:00 \| 14:25	A 20-MHz 2.3-mW Receiver and a 25-V Transmitter for Ultrasound Capsule Endoscopy Kyeongwon Jeong¹, Jaesuk Choi¹, Gichan Yun¹, Injun Choi¹, Jeehoon Son², Jae Youn Hwang², Sohmyung Ha³, and Minkyu Je¹ ¹KAIST, Korea ²DGIST, Korea ³New York University Abu Dhabi, United Arab Emirates Abstract: We proposed firstly ultrasound capsule endoscopy (USCE) ASIC. An on-chip transmitter (TX) is designed to generate a high voltage pulse applied in the transducer. In addition, a highly power-efficient ultrasound (US) receiver (RX) IC for US capsule endoscopy (USCE) is presented. We propose a RX structure with synchronized analog envelope detection (ED) to reduce the required ADC speed. A ping-pong noise-shaping SAR (NS-SAR) ADC with a passive gain is employed for high power efficiency and resolution.
13.2 (7159)	14:25 \| 14:50	An Intra-Body-Power-Transfer System with a PLL-based Continuous Maximum Resonant Power Tracking Loop at TX and 1.8V DC Output Voltage at RX Hyungjoo Cho¹, Ji-Hoon Suh¹, Gichan Yun¹, Sohmyung Ha², and Minkyu Je¹ ¹KAIST, Korea ²New York University Abu Dhabi, United Arab Emirates Abstract: We present an intra-body-power-transfer (IBPT) system that delivers power greater than 100μW even across 150cm on-body distance. The proposed IBPT TX employs a PLL-based maximum-resonant-power-tracking (MRPT) loop running in the background to maximize the power delivered to the load (PDL) without any need for RX-to-TX back telemetry or tuning phase, enabling continuous power delivery. The PDL and power transfer efficiency (PTE) are further improved by inducing parallel resonance at RX. Fabricated in a 180nm BCD process, the IBPT system achieves 136μW PDL at 1.8V DC output with 8.83% end-to-end power efficiency.
13.3 (7163)	14:50 \| 15:15	A 2m-Range 711uW Body Channel Communication Transceiver Featuring Dynamically-Sampling Bias-Free Interface Front End Guanjie Gu¹, Changgui Yang¹, Zhuhao Li¹, Xiangdong Feng¹, Ziyi Chang¹, Ting-Hsun Wang¹, Yunshan Zhang¹, Yuxuan Luo¹, Hong Zhang¹, Ping Wang¹, Sijun Du², Yong Chen³, and Bo Zhao^1* ¹Zhejiang University, China ²Delft University of Technology, Netherlands ³University of Macau, China ^* Corresponding Author: Bo Zhao ([email protected]) Abstract: The state-of-art BCC transceivers have realized low power consumption, but the communication range is still limited to less than 1m. One of the issues limiting the communication range of BCC is the loss at the interface between human body and transceiver. The DC bias in previous closed-loop and gate-input techniques reduced the input impedance and voltage gain of IFE, leading to a high interface loss. In this work, we propose a dynamically-sampling bias-free IFE to realize a 90KOhm input impedance and 94dB RF-IF conversion gain of IFE, resulting in a receiving sensitivity of -104dBm. Therefore, the communication range has been extended to 2m with 711uW total power consumption.
13.4 (7169)	15:15 \| 15:40	A Low-power Sleep Apnea Monitoring IC with a Duty-Recovered Body Channel Communication Receiver Pangi Park, Donghyeok Cho, SeongHwan Cho KAIST, Korea Abstract: This paper presents an in-home level-4 sleep apnea monitoring IC that can measure three basic parameters such as airflow, HR, and SpO2. A duty-recovered BCC receiver is proposed to allow the both transmitter and receiver side can be duty-cycled, and the power efficiency of the readouts is improved by regulating the voltage of the interface node of sensing units and readouts. With the proposed techniques, the receiver power is reduced by 98.8%, and the overall system power is 93.8% smaller than the previous work.

Session 14: LDO Voltage Regulators

Session Chair: Hyun-Sik Kim, KAIST
Session Co-chair: Hyungil Chae, Konkuk University
Date: Nov. 09, 2022 (Wednesday)
Time: 09:00 – 10:40 (UTC+8)
Room: Auditorium 國際會議廳, 10F

ID	Time	Title / Authors / Affiliation
14.1 (7085)	09:00 \| 09:25	A Digital LDO in 22nm CMOS with a 4b Self-triggered Binary Search Windowed Flash ADC Featuring Automatic Analog Layout Generator Framework Xiaosen Liu^1,2, Soner Yaldiz², Parijat Mukherjee², Steven Burns², Harish Krishnamurthy², Krishnan Ravichandran², Zakir Ahmed², Nachiket Desai², Nicolas Butzen², James Tschanz², Vivek De² ¹Tsinghua University, School of Integrated Circuits, China ²Intel Corporation, U.S.A Abstract: An analog layout generator based DLDO with a self-triggered binary search windowed flash ADC is proposed in 22nm CMOS to maximize the productivity of implementing analog circuit blocks in scaled CMOS process, thus significantly improving the physical design time & effort up to 60× compared with conventional manual approach. A self-triggered binary search mechanism with a delay-based architecture is proposed to reduce the exponentially growing kickback noise and energy consumption of a traditional flash ADC down to the level of a SAR ADC while maintaining its high speed feature. The DLDO features 3.55ps FoM and fully automatic generation.
14.2 (7086)	09:25 \| 09:50	A Fast-Transient and Wide-Range Output Capacitor-Less NMOS LDO Regulator with Adaptive-Gain Nested Miller Compensation and Pre-Emphasis Inverse Biasing Hyunjun Park, Woojoong Jung, Minsu Kim, and Hyung-Min Lee Korea University, Korea Abstract: The proposed capless LDO can ensure stability at a wide load range as well as achieve higher bandwidth for fast transient at larger ILOAD by adopting an adaptive-gain nested Miller compensation. A pre-emphasis inverse biasing also improves slew rate at the gate of an NMOS pass transistor by sourcing adaptive bias current into a super source follower. The 180nm CMOS LDO acquires high unity-gain bandwidth of 17.5MHz while providing a wide ILOAD range from 0.1mA to 300mA with phase margin above 60°. The LDO ensures small undershoot (48mV) and overshoot (59mV), achieving best FoM of 1.72ps.
14.3 (7144)	09:50 \| 10:15	A Capacitor-less Digital LDO using Ripple-FrequencyAdaptive Time-domain Digital Pre-distortion Technique Angxiao Yan¹, Wei Deng^1,2, Haikun Jia¹, Shiwei Zhang¹, Rui Wu³, Zhihua Wang^1,2, and Baoyong Chi¹ ¹singhua University, China ²Research Institute of Tsinghua University in Shenzhen, China ³National Key Lab of Microwave Imaging Technology, AIR, CAS, China Abstract: A Digital low-dropout regulator (D-LDO) with time-domain digital pre-distortion (DPD) scheme is introduced in this paper. It features adaptive suppression of supply voltage ripple without introducing analog-assisting loop or large capacitor. The proposed all-digital ripple cancellation technique is effective against arbitrary ripple waveforms and any ripple frequency from kHz to a quarter of the clock frequency. The measurement results indicate a -24.5 dB rejection ratio and an improvement of 9.5 dB over the conventional D-LDO. This work demonstrates the possibility and feasibility of digital-domain ripple cancellation for the first time.
14.4 (7072)	10:15 \| 10:40	A Self-Clocked TDC-Based Unified Clock and Voltage Regulator with Replica Frequency-Locked Loop and Hysteresis Switching in 65nm CMOS Xuliang Wang, Wing-Hung Ki, and Philip K. T. Mok The Hong Kong University of Science and Technology, China Abstract: A self-clocked digital low-dropout regulator (DLDO) employing a tunable replica oscillator (TRO) and a beat-frequency (BF) quantizer is proposed to supply and clock the microprocessors. The standard D-flip-flop is utilized as both the time-to-digital converter (TDC) and the sampling clock or BF clock generator. Fast transient response and static low power consumption are achieved simultaneously by the adaptive sampling capability of the BF quantizer. With the help of the proposed hysteresis switching logic (HSL) and replica frequency-locked loop (FLL), the built-in offset of the BF quantizer is eliminated. The TRO powered by the output of DLDO mimics half of the critical path delay of microprocessors and guarantees error-free operation even during voltage undershoot caused by load transients. In the load transient test of 50mA/μs with a 100-pF load capacitor, the proposed HSL improves the voltage undershoot and the steady-state offset by 25% and 84%, respectively. Fabricated in 65-nm LP process, the tested prototype holds an active area of 0.045mm^2 and achieves 0.76-ps FOM.

Session 15: Energy-Efficient Machine Learning Processors and High-Speed Interface

Session Chair: Yu-Guang Chen, National Central University
Session Co-chair: Chao Wang, Huazhong University of Science and Technology
Date: Nov. 09, 2022 (Wednesday)
Time: 09:00 – 10:40 (UTC+8)
Room: Song Bo 松柏廳, 10F

ID	Time	Title / Authors / Affiliation
15.1 (7145) (Highlight)	09:00 \| 09:25	A 2.47 μJ/sample QR-Decomposition-based Extreme Learning Machine Engine Supporting Online Class Incremental Learning for ECG-based User Identification Yi-Ta Chen, Li-Sheng Chang, Yu-Chuan Chuang, An-Yeu Wu National Taiwan University, Taiwan Abstract: To support online class incremental learning (O-CIL) in ECG-based user identification, this work presents a QR-decomposition-based extreme learning machine (QRD-ELM) engine. A diagonally-mapped linear array (DMLA) enables the support of online learning reducing 98.5% of area. The integrated PE design with unified COordinate Rotation DIgital Computer (u-CORDIC) further reduces 15.3% of the area and 22.4% of the power consumption. A model-algorithm-circuit co-design module to support class incremental learning with low energy and area overhead. The QRD-ELM engine fabricated in 40nm CMOS technology with 1.33×1.33 mm2 die area achieves 2.47 μJ/sample learning energy efficiency, which is 28.5× than the state-of-the-art.
15.2 (7215) (Highlight)	09:25 \| 09:50	A 1.3mW Speech-to-Text Accelerator with Bidirectional Light Gated Recurrent Units for Edge AI Yu-Hsuan Tsai^1, Yi-Cheng Lin^1, Wen-Ching Chen², Liang-Yi Lin², Nian-Shyang Chang², Chun-Pin Lin², Shi-Hao Chen³, Chi-Shi Chen², and Chia-Hsiang Yang¹ ¹National Taiwan University, Taiwan ²Taiwan Semiconductor Research Institute, Taiwan ³Digwise Technology Ltd., Taiwan ^Equally-Credited Authors (ECAs) Abstract:* This work presents an energy-efficient speech-to-text accelerator. The bidirectional light gated recurrent unit (BLiGRU)-based neural network is adopted to achieve a high accuracy. Network compression is utilized to reduce the network size and associated computational complexity by 29.8× and 73.2×, respectively. Efficient sequence decoding without backtracking is implemented to reduce the latency and memory usage. The chip performs speech-to-text conversion in 9.77 ms/frame with 1.3 mW at 1.25 MHz. Compared to the state-of-the-art designs, the chip achieves a 6.5-to-177× lower normalized energy with the lowest 15.2% phone error rate (PER) on the TIMIT dataset.
15.3 (7166)	09:50 \| 10:15	A 6 Gbps PAM-3 Transceiver with Time-Varying Offset Compensation Ju Eon Kim^1,2, Dong-Hyun Yoon², Junyoung Song³, Kwang-Hyun Baek⁴, Jung-Hwan Choi¹, and Tony Tae-Hyoung Kim² ¹Samsung Electronics, Korea ²Nanyang Technological University, Singapore ³Incheon National University, Korea ⁴Chung-Ang University, Korea Abstract: CMOS technology scaling improves performance by reducing supply voltage, parasitic capacitor, and physical area. Thus, device reliability issues, such as component mismatches and aging effects become prominent in the aggressively scaled technology. Especially, signal levels of PAM are highly susceptible to PVT variations and device mismatches. This paper proposes an offset compensation technique for a PAM-3 transceiver. The proposed compensation algorithm continuously detects faulty patterns and generates optimal reference voltage for the single-to-differential amplifier to cancel out time-varying offset. This work presents a 6Gbps PAM-3 transceiver in 65nm CMOS. The proposed technique improves the eye-opening by 38%.
15.4 (7147)	10:15 \| 10:40	A 12.8-Gbps 0.5-pJ/b Encoding-less Inductive Coupling Interface Using Clocked Hysteresis Comparator for 3Dstacked SRAM in 7-nm FinFET Kota Shiba¹, Mitsuji Okada², Atsutake Kosuge², Mototsugu Hamada², and Tadahiro Kuroda² ¹The University of Tokyo, Japan ²Research Association for Advanced Systems, Japan Abstract: A 0.5-pJ/b 12.8-Gbps/link inductive coupling inter-chip wireless communication interface for a 3D-stacked SRAM has been developed in a 7-nm FinFET process. A new clocked hysteresis comparator that eliminates encoding for synchronous communication achieves 1.49 times higher data rate and 36% lower energy consumption compared to conventional synchronous communication using Manchester encoding. Inter-chip communication at 0.5-pJ/b 12.8-Gbps/link was confirmed using test chips. The proposed interface for a 4-hi 3D-stacked SRAM module achieves a 1.7-TB/s/mm2 IO area efficiency, representing a two-orders-of-magnitude improvement over a state-of-the-art interface for a 3D-stacked SRAM with competitive energy efficiency.

Session 16: Advanced Signal Generation and Radar Techniques

Session Chair: Kenichi Okada, Tokyo Institute of Technology
Session Co-chair: Howard Luong, Hong Kong University of Science and Technology
Date: Nov. 09, 2022 (Wednesday)
Time: 09:00 – 10:40 (UTC+8)
Room: Chang Chin 長青廳, 10F

ID	Time	Title / Authors / Affiliation
16.1 (7111)	09:00 \| 09:25	A Compact Square-Geometry Quad-Core 19 GHz Class-F VCO with Parallel Inductor-sharing Technique achieving -137.2 dBc/Hz Phase Noise at 10MHz Offset Yaqian Sun¹, Wei Deng^1,2, Haikun Jia¹, Zhihua Wang^1,2, and Baoyong Chi¹ ¹Tsinghua University, China ²Research Institute of Tsinghua University in Shenzhen, China Abstract: A square-geometry quad-core oscillator with inductor sharing technique is proposed in this paper and it exhibits a compact area of 0.09 mm2, which is the smallest quad-core VCO operating at a similar oscillation frequency. The unwanted mode is suppressed by the metal trace that connects the drain node of adjacent cores. The proposed VCO is fabricated in 65nm CMOS technology. The measured phase noise is -137.2 dB/Hz at 10 MHz offset frequency from a carrier of 19 GHz, which translates to the FoM of 186.1 dBc/Hz.
16.2 (7064)	09:25 \| 09:50	A 17-21GHz Current-Folding Frequency Tripler With >36dBc Harmonic Rejection in 90nm CMOS Chun-Hung Lin and Ching-Yuan Yang National Chung Hsing University, Taiwan Abstract: A frequency tripler (FT) using a current-folding technique to achieve inherently nonlinear operation is presented. A built-in VCO generates the fundamental signal, and the proposed current-folding stage converts the fundamental input into the triple-frequency output, which is injected into a bandpass stage for harmonic suppression. Fabricated in 90-nm CMOS technology, the measured FT features 36 to 43-dBc harmonic rejection from 17.5 to 21 GHz (18.2% FTR), while consuming 3.5 mW only from 1.2-V supply. The measured phase noise (PN) of the VCO and the FT are -112.5 and -102.8 dBc/Hz at 1-MHz offset, respectively. Furthermore, the achieved ﬁgure-of-merit (FoM) of the proposed FT are -180.52 and -190.87 dB at 1-MHz and 10-MHz offset, respectively.
16.3 (7191)	09:50 \| 10:15	An 18.8-to-20.3-GHz Wide-Ramping-Range Cascaded-PLL-Based FMCW Generator with 44.1-kHz RMS Frequency Error and -105.6-dBc/Hz Phase Noise in 40-nm CMOS Xiaofei Liao^1,2, Feifan Hong^1,2, Sijie Pan², Xiaohu You^1,2, and Dixian Zhao^1,2 ¹Southeast University, China ²Purple Mountain Laboratories, China Abstract: A cascaded phase-locked loop (PLL) with wideband low-noise frequency modulation for frequency-modulated continuous-wave (FMCW) radar applications is presented. It utilizes a wideband millimeter-wave VCO with flat gain sensitivity to ensure wide chirp bandwidth and frequency modulation linearity. An in-depth analysis of the loop bandwidth optimization in cascaded PLL for the FMCW synthesizer is detailed. Fabricated in 40-nm CMOS, the proposed cascaded PLL can produce 1.5-GHz triangular and sawtooth chirp from 18.8 to 20.3 GHz, achieving a minimum root-mean-square (rms) frequency error of 44.1 kHz. The measured PN at 1-MHz offset from 19.2 GHz is -105.6 dBc/Hz.
16.4 (7089)	10:15 \| 10:40	A 140GHz 4TX-4RX Phased-Array FMCW-FSK AntennaPackaged Radar Chipset With 25dBm EIRP and 16GHz BW Shunli Ma¹, Tianxiang Wu¹, Zhuofan Xu¹, Zhonghao Sun¹, Xuefeng Li¹, Lei Wu¹, Biao Hu¹, Junyan Ren¹, Yong Chen², and Jiebin Pan³ ¹Fudan University, China ²University of Macau, China ³East China Institute of Photo-Electron IC, China Abstract: Frequency modulated continuous wave (FMCW) radar sensors are widely utilized for security checks, car-collision avoidances, vital signs of people, and tiny movements [1]-[5]. The 4D mm-wave radar needs large phased-array elements to realize accurate detecting. Range resolution is determined by the bandwidth (BW) of the transceiver (TRX). Moreover, it is better to design sensing and communication functions into the system simultaneously. This paper presents a 140GHz phased-array FMCW chipsets in 65nm bulk CMOS supporting a 16GHz BW with a custom horn antenna package. Based on the tile structures of the TRX, our system can be scaled up to a large size array for 4D phased-array radar.

Session 17: Emerging Circuit Techniques for Power Management, Sensing and Computing

Session Chair: Takuji Miki, Kobe University
Session Co-chair: Chihiro Okada, Sony Semiconductor Solutions Corporation
Date: Nov. 09, 2022 (Wednesday)
Time: 09:00 – 10:40 (UTC+8)
Room: V110 十全軒, VF

ID	Time	Title / Authors / Affiliation
17.1 (7236) (Highlight)	09:00 \| 09:25	A 14V Hybrid Boost Converter With Scalable Conversion Ratio in 180nm Standard CMOS for an Ultrasound Imaging System Jiaqi Guo¹, Jiamin Li², Jerald Yoo^1,3 ¹National University of Singapore, Singapore ²Southern University of Science and Technology, China ³The N.1 Institute for Health, Singapore Abstract: To provide the high voltage supply (>10V) and intermediate voltage domains required by the transducer driving circuits for ultrasound imaging, and to achieve that in the standard CMOS process for easy processor and IP integration, this works presents a 14V multiple-output boost converter with hybrid structure and PWM mode operation. The chip implemented in 180nm standard CMOS process regulates 3.5V, 7V, 10.5V and 14V from a 1.5V input, while keeping the switch stress (VGS, VDS) of all transistors below 3.5V at any switching state. It achieves a simulated efficiency of 78%, doubling the 35% achieved in earlier works.
17.2 (7091)	09:25 \| 09:50	A 0.24 mmHg (1σ) Resolution Half-Bridge-to-Digital Converter with RC Delay-Based Pressure Sensing and Energy-Efficient Bit-Level Oversampling Techniques for Implantable Miniature Systems Donguk Seo¹, Minsik Cho¹, Minhyeok Jeong¹, Gicheol Shin¹, Inhee Lee², and Yoonmyung Lee¹ ¹Sungkyunkwan University, Korea ²University of Pittsburgh, USA Abstract: A pressure sensor with a half-Wheatstone-bridge-to-digital converter is proposed for implantable miniature systems. The half-Wheatstone-bridge sensor uses an RC delay comparison, which self-limits current for energy-efficient operation. To overcome the limited sensitivity of the HB, bit-level oversampling is introduced and 0.24 mmHg (1σ) resolution with an 8.58 nJ∙mmHg2 FOM is achieved, which is significantly better than that of the prior-art HB-based pressure sensor and comparable to the Wheatstone-bridge-based pressure sensors.
17.3 (7040)	09:50 \| 10:15	A 0.0308mm² 4.15pJ/conv VCO-Based Current Sensing Front-End with 2^nd-Order Δ²-ΔΣ Modulation Jee-Ho Park, Ji-Hyoung Cha, Yongjae Park, and Seong-Jin Kim Ulsan National Institute of Science and Technology, Korea Abstract: This paper presents a 2nd-order Δ2-ΔΣ modulator based on a VCOQ with a PWM I-DAC for the precise acquisition of incoming current in an area- and energy-efficient form factor. The proposed Δ2-modulation substantially attenuates the magnitude of input signals, enhancing the linearity and DR. Moreover, an additional differentiator followed by the VCOQ features the negative feedback loop in the 2nd-order ΔΣ modulator, increasing noise shaping order with no DAC noise. In addition, the PWM I-DAC substituting the multi-bit I-DAC is devised to mitigate noise further, realizing the high resolution of 1 pA with 500-Hz bandwidth. The prototype chip fabricated in a 110-nm CMOS occupies 0.0308mm2 and achieves the Walden FoM of 4.15 pJ/conv.
17.4 (7193)	10:15 \| 10:40	A 57.2GHz 11.2mW 8-bit General Purpose Superconductor Microprocessor with Dual-Clocking Scheme Ikki Nagaoka¹, Ryota Kashima¹, Tomoki Nakano¹, Masamitsu Tanaka¹, Taro Yamashita², Koji Inoue³, and Akira Fujimaki¹ ¹Nagoya University, Japan ²Tohoku University, Japan ³Kyushu University, Japan Abstract: A superconductor single-flux-quantum (SFQ) logic 8-bit microprocessor is demonstrated up to 57.2 GHz with a measured power consumption of 11.2 mW. The microprocessor has an ultradeep, gate-level pipelining containing many feedback paths and communications between components. The arrival clock timings at all the logic gates are ultra-precisely tuned using two different clocking schemes, called “concurrent-flow” and “counter-flow,” to achieve extremely high clock frequency operation over 50 GHz. Low-temperature circumstances enable us to conduct super delay-intensive layout design by controlling delays of all waveguide interconnects in the order of sub-picosecond precision.

Session 18: Sensor Interfaces and References

Session Chair: Taekwang Jang, ETH, Swiss
Session Co-chair: Pieter Harpe, Eindhoven University of Technology
Date: Nov. 09, 2022 (Wednesday)
Time: 14:00 – 15:40 (UTC+8)
Room: Auditorium 國際會議廳, 10F

ID	Time	Title / Authors / Affiliation
18.1 (7080) (Highlight)	14:00 \| 14:25	A 0.56V/0.8V Vision Sensor with Temporal Contrast Pixel and Column-Parallel Local Binary Pattern Extraction for Dynamic Depth Sensing Using Stereo Vision Min-Yang Chiu, Guan-Cheng Chen, Yu-Hsiang Huang, Tzu-Hsiang Hsu, Chung-Chuan Lo, Ren-Shuo Liu, Meng-Fan Chang, Kea-Tiong Tang, Chih-Cheng Hsieh National Tsing Hua University, Taiwan Abstract: A 0.56V/0.8V 126x126 vision sensor with 6T1C temporal contrast pixel, exposure compensation scheme, column-parallel local-binary-pattern (LBP) and region-of-interest (ROI) extractions is prototyped and verified. For motion detection and position tracking, it supports 10b raw image, 10-bit frame difference, and 1.5-bit event reporting (ER) output. For dynamic depth sensing of moving objects using stereo vision system, it supports 8-bit LBP feature map and ROI for efficient disparity calculation.
18.2 (7150)	14:25 \| 14:50	A 118.6fJ/Conversion-Step Two-Step Time-Domain RCto-Digital Converter With 33nF/10MΩ Range and 53aFrms Resolution Hoyong Seong¹, Chongsoo Jung¹, Donghyun Youn1,Junghyup Lee², Sohmyung Ha³, and Minkyu Je¹ ¹KAIST, Korea ²DGIST, Korea ³New York University Abu Dhabi, United Arab Emirates Abstract: This paper presents a 2-step time-domain (TD) RC-to-digital converter (RCDC). To overcome the fundamental tradeoff between resolution and energy efficiency that constrains TD converter designs, a 2-step TD conversion method is proposed. Utilizing a slow reference oscillator (R-OSC) for coarse conversion and a fast duty-cycled gear-up oscillator (G-OSC) for fine conversion, the time period of the sensor oscillator output after frequency division can be measured with both high resolution and high energy efficiency. A duty-cycled phase-locked loop (PLL) is employed to consistently maintain the required relationship between the R-OSC and G-OSC outputs without any calibration. Fabricated in a 180nm CMOS, the proposed 2-step TD RCDC IC achieves 53aFrms resolution and 33nF/10MΩ input range, consuming 6.75μW.
18.3 (7224) (Highlight)	14:50 \| 15:15	A −50 to 130 °C, 38.69 pJ/conv Fully Integrated SAR Temperature Sensor Based on Direct Temperature-Voltage Comparison Jooeun Kim, Jeongmyeong Kim, Changjoo Park, Minkyu Yang, and Wanyeong Jung KAIST, South Korea Abstract: This paper presents a SAR temperature sensor using a clocked temperature-voltage comparator. The clocked comparator has an input offset which is linearly proportional to the temperature, and the SAR detects the offset voltage to measure the temperature. Temperature transduction is spatially and temporally confined in the comparator’s dynamic comparison, so it is robust against various circumstances. The SAR-based overall structure allows simple design and operation, without complex digital filtering nor post-processing, and low energy consumption. The test chip fabricated in 0.18μm CMOS process shows 3-sigma error of −2.54/+2.16°C over a wide range of −50 to +130°C, with 38.69pJ/conv energy consumption.
18.4 (7020)	15:15 \| 15:27	A Digital Temperature Sensor Based on 10b SAR ADC for Non-linear Temperature Dependency Compensation in 3D NAND Flash Memory Kyoung-Jun Roh, Min-Ki Jeon, Jaewoo Park, Myoungbo Kwak, Chi-Weon Yoon, Youngdon Choi and Jung-Hwan Choi Device Solutions, Samsung Electronics, Korea Abstract: In this paper, we propose a digital temperature sensor (DTS) to compensate a nonlinearity of VT shift with temperature in VNAND flash memory. The DTS consists of a voltage generator that generates a CTAT voltage from a bandgap reference voltage and a 10-bit SAR type ADC. And, the DTS is designed to work in synchronization with a NAND command signal. The proposed circuit is implemented with multi-stacked VNAND technology of Samsung Electronics. The conversion time takes a total of 4 μs including the voltage generator setup time. And, the resolution of 40 samples is 0.753 °C/LSB, and the maximum deviation with 1-point calibration for each NAND operation is 12 LSB.
18.5 (7102)	15:27 \| 15:40	A sub-nW scalable nMOS voltage reference with multiloop regulation achieving 0.0126%/V line sensitivity Chutham Sawigun, Xiaolin Yang, Andrea Lodi, and Carolina Mora Lopez imec, Belgium Abstract: In order to achieve a better LS than other existing techniques, we propose in this paper a regulated voltage reference that allows multiple regulation loops for LS improvement, and offers output voltage scalability in a single-branch topology. The proposed VR uses only nMOS devices, occupies the smallest area and achieves the lowest LS compared with other state-of-the-art regulated VRs.

Session 19: Imaging & Machine Learning Processing on FPGA

Session Chair: Tay-Jyi Lin, National Chung Cheng University
Session Co-chair: Ji-Hoon Kim, Ewha Womans University
Date: Nov. 09, 2022 (Wednesday)
Time: 14:00 – 15:40 (UTC+8)
Room: Song Bo 松柏廳, 10F

ID	Time	Title / Authors / Affiliation
19.1 (7217)	14:00 \| 14:25	A Real-Time High-Resolution Variable-Size Imaging Processor for Spaceborne Synthetic Aperture Radar Jia-Zhao Lin¹, Po-Ta Chen¹, Hung-Yuan Chin¹, Pei-Yun Tsai¹, and Sz-Yuan Lee² ¹National Central University, Taiwan ²National Applied Research Laboratory, Taiwan Abstract: We present a real-time imaging processor for spaceborne high-resolution synthetic aperture radar. To achieve the goal, DRAM burst access pattern is developed given azimuth FFT/IFFT decomposition with bit-reversed frequency-domain data to achieve streaming input /output in the processing kernel. Hybrid datapaths that use 17-bit customized floating point (CFP) FFT/IFFT operations and 64-bit double precision arithmetic units for phase calculation are designed to meet the precision requirement. Multi-segment high-order Taylor series expansion is adopted to approximate the complicated migration factors to support configurability. Our implementation shows at least 2.93X improvement in normalized processing time and has excellent precision.
19.2 (7239)	14:25 \| 14:50	A 409.6 GOPS and 204.8 GFLOPS Mixed-Precision Vector Processor System for General-Purpose Machine Learning Acceleration Jung-Hoon Kim, Sukjin Lee, Seungjae Moon, Sungyeob Yoo, and Joo-Young Kim KAIST, Korea Abstract: This paper presents a mixed-precision vector processor named MVP and its multi-core system for general-purpose ML acceleration. It has three key contributions: 1) MVP supports fixed and floating-point data types and various AI operations with scalable vector lanes, 2) MVP has a two-level instruction set architecture (ISA), and its microcode generator enables handy ML model mapping and small code size, and 3) the software stack efficiently allocates a target ML model into multiple MVPs, generating all the necessary runtime binaries. As a result, the proposed multi-MVP system provides a peak performance of 409.6 GOPS and 204.8 GFLOPS and energy efficiency of 13.97 GOPS/W and 6.99 GFLOPS/W on a Xilinx Alveo U50 FPGA card, achieving 83.84% average effective utilization when it runs various ML models.
19.3 (7248)	14:50 \| 15:15	An Efficient Unsupervised Learning-based Monocular Depth Estimation Processor with Partial-Switchable Systolic Array Architecture in Edge Devices Wonhoon Park, Dongseok Im, Hankyul Kwon, and Hoi-Jun Yoo Korea Advanced Institute of Science and Technology, Korea Abstract: In this paper, the unsupervised learning-based MDE processor is proposed with the following key features: 1) the multi-path simultaneous processing (MPSP) to reduce the external memory access of the multi-path sampling block by 16.8%, 2) partial-switchable systolic array (PSSA) architecture to maintain the high utilization of the processing elements achieving average 51.5% of throughput enhancement, and 3) dynamic network selection learning (DNSL) system to optimize the pose network during the training increasing the system energy efficiency by 59% for getting supervision
19.4 (7235)	15:15 \| 15:40	F-LIC: FPGA-based Learned Image Compression with a Fine-grained Pipeline Heming Sun^1,2,3, Qingyang Yi⁴, Fangzheng Lin¹, Lu Yu², Jiro Katto¹, and Masahiro Fujita^4,5 ¹Waseda University, Japan ²Zhejiang University, China ³JST, PRESTO, Saitama, Japan ⁴The University of Tokyo, Japan ⁵AIST, Japan Abstract: This paper gives an FPGA design for learned image compression (LIC). By proposing a fine-grained pipelining schedule, higher DSP efficiency can be obtained. Besides, we also propose the cascading DSP schemes and zero-skipping deconvolution scheme. Compared with latest FPGA-based LIC, we can reach faster speed with higher power efficiency.

Session 20: Interfaces for High-Speed Memory

Session Chair: Chiweon Yoon, Samsung Electronics
Session Co-chair: Pen-Jui Peng, National Tsing Hua University
Date: Nov. 09, 2022 (Wednesday)
Time: 14:00 – 15:40 (UTC+8)
Room: Chang Chin 長青廳, 10F

ID	Time	Title / Authors / Affiliation
20.1 (7104) (Highlight)	14:00 \| 14:25	A 0.95pJ/b 5.12Gb/s/pin Charge-Recycling IOs with 47% Energy Reduction for Big Data Applications Han Wu¹, Jeong Hoan Park², Miaolin Zhang¹, Longyang Lin³, Rucheng Jiang¹, Jung-Hwan Choi², Jerald Yoo^1,4 ¹National University of Singapore, Singapore ²Samsung Electronics, South Korea ³Southern University of Science and Technology, China ⁴The N.1 Institute for Health, Singapore Abstract: We propose the Charge-Recycling IOs (CRIOs) save energy up to 32.2% for the TSV link (2.56Gb/s) and 47% for the T-Line link (5.12Gb/s), when compared with conventional IOs. Implemented in 40nm 1P8M standard CMOS, the signal integrity and the BER performance of the proposed CRIOs is comparable to the conventional IOs.
20.2 (7045)	14:25 \| 14:50	A 10Gb/s/pin DQS and WCK Built-Out Tester for LPDDR5 DRAM Test Chan-Ho Kye¹, Jihee Kim², Kyungmin Baek², Kahyun Kim², Sangjin Pack³, Changwon Jung³, and Deog-Kyoon Jeong² ¹EPFL, Switzerland ²Seoul National University, Korea ³SK Hynix, Korea Abstract: We propose a data strobe (DQS) and write clock (WCK) tester that can replace DFT for the high-speed test of LPDDR5 DRAM.
20.3 (7009)	14:50 \| 15:15	A 7.5Gb/s/pin 12Gb-LPDDR5x SDRAM with a Pseudodouble-bit ECC and “Spider”-shape Datapath Control Architecture in a 2^nd Generation 10nm DRAM Process Feng Lin, Kangling Ji, Enpeng Gao, Zhonglai Liu, Weibing Shang, Hongwen Li Changxin Memory Technologies, Inc., China Abstract: A 12Gb LPDDR5x SDRAM is presented with unique pseudo-double-bit ECC functions. A “Spider”-shape eight-way multiplex is served as central traffic control of high-speed datapaths. A direct dynamic voltage and frequency scaling is proposed to cut down boundary crossing power consumption by 57%. Data receivers with 1-tap DFE is proposed with an on-die eye monitor for margin evaluation. The chip is manufactured using a 2nd generation 10nm DRAM process and achieved 7.5Gb/s/pin data rate under 1.05V.
20.4 (7230)	15:15 \| 15:40	A Single-Ended Duobinary-PAM4(PAM7) Transmitter with a 2-Tap Feed-Forward Equalizer Jaenam Kim^{1, 2}, Sanghyeon Park^{1, 2}, Jaewoo Park¹, Junhan Bae¹, and Jung-Hoon Chun^{1, 3} ¹Sungkyunkwan University, South Korea ²Samsung Electronics, South Korea ³SolidVue, South Korea ^Equally Credited Authors (ECAs) Abstract:* A PAM4/duobinary-PAM4 dual-mode transmitter is demonstrated in a 28 nm CMOS technology. The duobinary-PAM4 encoder adds two half-rate PAM4 signals driven by quarter-rate clocks and produces 7-level duobinary-PAM4 signals. The proposed transmitter with a 2-tap feed-forward equalizer consists of 48 source-series terminated (SST) driver segments that are partitioned into six blocks to generate a duobinary-PAM4 signal. At 18 Gb/s, the proposed transmitter achieves 1.11-pJ/b and 1.66-pJ/b energy efficiency in duobinary-PAM4 and PAM4 modes, respectively.

Session 21: Application-Oriented ADCs

Session Chair: Chih-Cheng Hsieh, National Tsing Hua University
Session Co-chair: Shuang Zhu, NVIDIA
Date: Nov. 09, 2022 (Wednesday)
Time: 14:00 – 15:40 (UTC+8)
Room: V110 十全軒, VF

ID	Time	Title / Authors / Affiliation
21.1 (7112) (Highlight)	14:00 \| 14:25	A 91-dB DR 20-kHz BW 5th-Order Multi-Step Incremental ADC for Sensor Interfaces by Re-Using a MASH 2-1 Modulator Jia-Sheng Huang^1,2, Shih-Che Kuo¹, Yu-Cheng Huang¹, Chia-WeiKao^1,2, Che-Wei Hsu^1,3 and Chia-Hung Chen¹ ¹National Yang Ming Chiao Tung University, Taiwan ²Now with Realtek, Taiwan ³Now with Mediatek, Taiwan Abstract: A 3rd-order multi-stage incremental ΔΣ ADC (IADC) is proposed to operate in two steps by re-using the same hardware. The first-step is a third-order cascaded IADC for oversampling ratio OSR=24, and then the circuit is reconfigured as a second-order IADC for another OSR=16 for the fine-quantization. The noise-shaping performance is boosted from third- to fifth-order. Prototyped in 0.18 μm technology, the measured DR/SNDR are 91/89 dB and it achieves Schreier FoMs 168.5/166.6 dB for 10 kHz BW.
21.2 (7165)	14:25 \| 14:50	A 78.6 dB-SNDR 520mVpp-full-scale 620MΩ-Zin 105dBCMRR VCO-based Sensor Readout Circuit Using FVFBased Gm-Input Structure Yi Zhong, Lu Jie, and Nan Sun. Tsinghua University, China Abstract: This paper presents a flipped-voltage-follower (FVF)-based Gm-input CT-ΔΣ ADC with an input impedance enhancement technique. The prototype ADC achieves 78.6dB SNDR with 10 kHz BW at the input range of 480mVpp while consuming 7.1μW, resulting in the Schreier FoM (FoMs) of 170.1dB. This work also achieves 620MΩ input impedance at the chopping frequency of 45kHz and 105dB CMRR.
21.3 (7043)	14:50 \| 15:15	110.1dB DR 4-ch Audio ADCs and 98dB DR 2-ch VoiceTriggering ADCs in Reconfigurable Architecture with Enhanced Off-Transistor-Based Bias Noise Filter Moo-Yeol Choi, Inhwan Cho, Myungjin Lee, Seunghyun Oh, Jongwoo Lee Samsung Electronics, Korea Abstract: 4-ch audio ADCs and 2-ch voice-trigger system ADC with an enhanced off-transistor-based bias noise filter are proposed. The proposed technique addresses the limitations of a voltage drift by well-diode leakage and a reduced equivalent resistance in the previous work of off-transistor-based noise filter. The measured results of audio ADC show 110.1dB DR and -100.1dB THD+N. The CT-DSM in this work achieves the Schreier FoM of 185.7dB in audio ADC mode and 170.6dB in VTS ADC mode and attains the highest DR despite of the additional noise of a capacitive-coupled gain amplifier.
21.4 (7219)	15:15 \| 15:27	A 103.8-dB DR 25ps-to-35ns Resolution Time-to-Digital Converter with Dynamic Ring Oscillator for LiDAR Applications Taewoong Kim^1,2, Sanghoon Lee¹, and Youngcheol Chae¹ ¹Yonsei University, Korea ²Now in Samsung Electronics, Korea Abstract: This paper proposes a wide dynamic range TDC for LiDAR sensors, the architecture of which is basically a ring oscillator (RO)-based folding TDC and can have different resolutions proportional to the input range by using a dynamically pre-charged supply voltage on a reservoir capacitor. This dynamic RO changes its time resolution from 25 ps to 35 ns. This in turn leads to a significant increase in the dynamic range, resulting in a maximum measurable time of 3.9 μs, which means a distance of 585 m. Implemented in a small area of 0.0135 mm2 with a 28 nm FDSOI process, the prototype TDC achieves a wide dynamic range of 103.8 dB while consuming only 45.6 μW.
21.5 (7209)	15:27 \| 15:40	A 0.3V 762nW-Only Binary-Search Phase ADC With Current-Reused RO-based Comparator Sifan Wang¹, Kejin Li¹, Chi-Hang Chan¹, Yan Zhu¹, Rui Paulo Martins^1,2 ¹University of Macau, China ²On leave Universidade de Lisboa, Portugal Abstract: This paper presents a 0.3V 4b binary-search-based phase ADC, running at 1MS/s while only consuming 762nW. Unlike existing techniques with large peripheral circuits and power overhead, the proposed phase ADC keeps simple and consumes purely dynamic power. The linear combiner cascode with the ring-oscillator-based (RO-based) comparator allows current-reused at ultralow voltage. Further incorporated with the proposed binary-search logic for the phase quantization, it realizes an outstanding energy efficiency by reducing the number of comparisons to four in this 4b phase ADC. The phase ADC\'s timing loop is asynchronous, thus maintaining a 1MHz sampling rate under such low voltage.