ID | Time | Title / Authors / Affiliation |
1.1 (7033) (Highlight) |
10:50 | 11:15 |
Energy Efficient BNN Accelerator using CiM and a TimeInterleaved Hadamard Digital GRNG in 22nm CMOS Richard Dorrance, Deepak Dasalukunte, Hechen Wang, Renzhi Liu, Brent Carlton Intel Corporation, USA Abstract: In this paper, we propose a Bayesian Neural Network (BNN) accelerator leveraging a C-2C SRAM-based analog Compute-in-Memory (CiM) macro for the MAC operations and a variable precision (with programable statistical quality), time-interleaved Hadamard Gaussian Random Number Generator (GRNG) for probabilistic weight generation. The proposed BNN prototype achieve a 25% speedup over the state-of-the-art with a 35× improvement in energy efficiency. |
1.2 (7176) (Highlight) |
11:15 | 11:40 |
Sub-GHz RF Energy Harvester including a Small Loop
Antenna Darshan Shetty1, Christoph Steffan1, Wolfgang Bösch2, Jasmin Grosinger2 1Infineon Technologies AG, Austria 2Graz University of Technology, Austria Abstract: This work presents a sub-GHz RF energy harvester comprising an RF-DC converter implemented in a 130 nm CMOS technology, a conjugate matched loop antenna, and an output load. The RF-DC converter uses a novel threshold voltage compensation technique, implemented using an inbuilt nanowatt current reference circuit. The threshold compensation design ensures robust system performance across temperature and process corner variations. Measurements of the RF energy harvester including the antenna reveal an excellent 1 V sensitivity of -33 dBm for an output load of 1 GΩ and a peak PCE of 53%. |
1.3 (7022) (Highlight) |
11:40 | 12:05 |
An Attachable Fractional Divider Transforming an
Integer-N PLL Into a Fractional-N PLL with SSC Capability Atsushi Motozawa, Yasuyuki Hiraku, Yoshitaka Hirai, Naoaki Hiyama, Yusuke Imanaka, Fukashi Morishita Renesas Electronics Corporation, Japan Abstract: In automotive industry, the system handles with weak satellite signals. Therefore, the output frequency of PLLs is carefully designed to avoid EMI. Recently, GNSS is becoming more common and available frequency bands for clocks are getting narrow. That leads replacement Int-N PLLs with Frac-N PLLs is needed to obtain smaller frequency steps. In this paper, an attachable FDIV is proposed to transform an Int-N PLL into a Frac-N PLL with SSC capability with minimal design effort. A Frac-N PLL with the proposed FDIV achieves -69.3dBc of the worst fractional spur and EMI reduction by 18.7dB in SSC operation. |
1.4 (7024) (Highlight) |
12:05 | 12:30 |
A Learning-Based Algorithm for Early Floorplan With
Flexible Blocks 1JEN-WEI LEE, 1YI-YING LIAO, 1TE-WEI CHEN, 1YU-HSIU LIN, 1CHIA-WEI CHEN, 1CHUN-KU TING, 1SHENG-TAI TSENG, 1RONALD KUO-HUA HO, 1HSIN-CHUAN KUO, 1CHUN-CHIEH WANG, 1MING-FANG TSAI, 1CHUN-CHIH YANG, 1TAI-LAI TUNG, and 2DA-SHAN SHIU 1MediaTek, Taiwan 2MediaTek Research, Taiwan Abstract: This paper presents a learning-based algorithm using graph neural network (GNN) and deconvolution network to predict the placement of the locations and the aspect ratios for the design blocks with flexible rectangles. With several hours of training on 4 GPUs, the proposed method, targeting at minimizing the cost of wirelength, can generate the placements in early stage of floorplan which is superior to that from the manual placements which requires several days’ efforts for physical design experts. |
ID | Time | Title / Authors / Affiliation |
2.1 (7081) |
14:00 | 14:25 |
A Single-inductor Triple-output Buck DC-DC Converter
with Electromagnetic Gated Low Dropouts for Higher
Resistance to Electromagnetic and Power Side-Channel
Attacks with 3B Minimum Traces to Disclosure
Improvement in Internet of Things Applications Ya-Ting Hsu1, Yu-Jheng Ouyang1, Ke-Horng Chen1, Kuo-Lin Zheng2, Ying-Hsi Lin3, Shian-Ru Lin3, and Tsung-Yen Tsai3 1National Yang Ming Chiao Tung University, Taiwan 2Chip-GaN Power Semiconductor Corporation, Taiwan 3Realtek Semiconductor Corp, Taiwan Abstract: The proposed single-inductor triple-output buck converter with electromagnetic gated low dropouts with the advantage of hiding electromagnetic leaked signature. The proposed intelligent true random number generator reduces the peak EMI noise from 88.4dBμV to 54.9dBμV at the fundamental frequency, unobvious tones in fast Fourier transform. Reduction of 33.5dBμV can be derived, improving the minimum traces to disclosure to about 3B. |
2.2 (7046) |
14:25 | 14:50 |
An One-Cycle Load Transient Response and 0.81 mV/A
Load-Regulation Time-Domain Cascaded-VCOControlled Buck Converter for Powering Gaming SoC Chieh-Ju Tsai1, I-Fang Lo2, Tsung-Hsien Lin1, Ching-Jan Chen1 1National Taiwan University, Taiwan 2Richtek Technology Corporation, Taiwan Abstract: A time-domain cascaded-VCO-controlled buck converter with low-cost output LC filter for gaming SoC application is proposed. By separating the modulation and frequency stabilization functions, the KVCO mismatch issue of conventional time-based PWM controller is no longer exists. The steady-state FSW error less than ±0.81% is measured. The proposed controller achieves 0.81mV/A load regulation, 1-cycle load transient settling (1μs), and at least 2X FoM improvement over prior arts. |
2.3 (7038) |
14:50 | 15:15 |
A 90.6% Peak-Efficiency 1.5A Dual Inductor Ladder BuckConverter Achieving 0.93W/mm2 Active Peak Power Density for Li-ion Battery Operated PMICs Arindam Mishra, Wei Zhu, and Valentijn De Smedt ESAT, ADVISE, KU Leuven, Belgium Abstract: A dual-inductor-ladder (DIL) DC-DC converter is presented to provide 0.3-1V output down conversion directly from a 2.5-5V Li-ion battery for low-voltage System-on-Chips (SoCs). Inherent inductor current and capacitor voltage balancing, complete capacitive soft-charging, and reduced inductor current facilitate the converter to achieve very high active and passive power-density, and efficiency even for compact-volume inductors. The DIL is fabricated in a 65nm CMOS technology obtaining 90.6% peak efficiency, 0.93W/mm2 active peak power density, and a maximum 1.5A load current support occupying just over 1mm2 die area. |
2.4 (7108) |
15:15 | 15:40 |
A 96.62%-Peak-Efficiency and Seamless-Mode-Transition
Buck-Boost DC-DC Converter with Auto-Shift-Ramp Chi-Wei Chen, Bao-Xian Peng, and Hsin-Shu Chen National Taiwan University, Taiwan Abstract: This paper proposes an Auto-Shift-Ramp (ASR) technique, which can significantly alleviate the undershoot or overshoot voltage caused by the mode transition in the multi-mode DC-DC converters. The proposed ASR shifts the starting time of the ramp voltage and empowers the DC-DC converter to change the duty instantly after the mode changes without limiting the maximum duty or changing the modulator gain. According to the measurement results, the mode transition overshoot voltage is less than 16mV or 0.48% with less than 18.98μsec settling time. The converter achieves 96.62%-peak-efficiency at 50mA load current in buck mode. Compared to the prior works, the proposed DC-DC converter with ASR achieves a much lower mode transition voltage than prior works, even with smaller output capacitance. |
ID | Time | Title / Authors / Affiliation |
3.1 (7065) (Highlight) |
14:00 | 14:25 |
SNPU: Always-on 63.2µW Face Recognition Spike Domain
Convolutional Neural Network Processor with Spike Train
Decomposition and Shift-and-Accumulation Unit Sangyeob Kim, Sangjin Kim, Soyeon Um, Soyeon Kim, Juhyoung Lee and Hoi-Jun Yoo Korea Advanced Institute of Science and Technology, Korea Abstract: The proposed SNPU has 3 key features. First, Spike Train Decomposition reduces the accumulations (ACCs) by 71.8%. Second, Time Shrinking Multi-Level Encoding replaces the multiple ACCs with single Shift-and-Accumulation (SAC), and SAC unit adopts bit scalability to enable different always-on applications. Third, Neuron Link supports various time-windows to optimize energy consumption by minimizing time-window in layer-by-layer and increases the PE utilization by 14.06% for FR. For LFW dataset, the proposed processing can reduce the energy consumption by 43.9% due to neuron-level event-driven operation. If there is no face in the input, the energy can be reduced further by 87.6%. |
3.2 (7042) (Highlight) |
14:25 | 14:50 |
A 28nm 57.6TOPS/W Attention-based NN Processor with
Correlative Computing-in-Memory Ring and Dataflowreshaped Digital-assisted Computing-in-Memory Array Ruiqi Guo1, Zhiheng Yue1, Hao Li1, Te Hu1, Yabing Wang1, Hao Sun1, Jeng-Long Hsu2, Yaojun Zhang3, Bonan Yan4, Leibo Liu1, Ru Huang4, Shaojun Wei1, Shouyi Yin1 1Tsinghua University, China 2NeoNexus Pte. Ltd., Singapore 3Pimchip Technology Co., Ltd., China 4Peking University, China Abstract: This paper presents a 28nm 7.10mm2 CIM-based transformer processor, achieving 23.81-to-57.6 TOPS/W system energy efficiency. This paper proposes three key design features in the chip: 1) A correlative CIM ring to avoid it to load dynamically generated matrices. 2) A softmax-based speculate unit to eliminate redundant attention computing. 3) A dataflow-reshaped digital-assisted CIM-array to achieve fully pipelined computations of the final attention result. The chip can work at 0.56-to-0.9V, 151-to-202MHz. The chip consumes average power of 57.97mW at 202MHz and 0.9V. |
3.3 (7170) |
14:50 | 15:15 |
A 65nm 8-bit All-Digital Stochastic-Compute-In-Memory
Deep Learning Processor Jiyue Yang, Tianmu Li, Wojciech Romaszkan, Puneet Gupta, and Sudhakar Pamarti University of California, Los Angeles, USA Abstract: This work presents the first ADC/DAC-free compute-in-memory accelerator based on Stochastic Computing (SC). A Stochastic-Compute-in-Memory Accelerator (SCIMA) is presented that (1) embeds SC MAC logic inside an SRAM that only requires 1-bit decisions and no DACs/ADCs, (2) reduces SC number generation costs significantly, and 3) employs a computation skipping technique for SC’s average pooling function that reduces the total latency and energy by 4x. The Measured 65nm chip achieves 7.96 TOPS/W energy efficiency for the whole system and 20 TOPS/W for the macro. The solution provides 6x better CIM macro density and 2.5x better peak system energy efficiency of 8-bit precision and network classification accuracy comparable to fixed-point implementations. |
3.4 (7188) |
15:15 | 15:40 |
High-speed and energy-efficient crypto-processor for
post-quantum cryptography CRYSTALS-Kyber Taishin Shimada, Makoto Ikeda The University of Tokyo, Japan Abstract: This paper presents the design and measurement results of an ASIC for high-speed, low-power key exchange using CRYSTALS-Kyber, a type of post-quantum cryptography(PQC). The design focuses on a large number of number-theoretic transformations (NTT) in Crystals-Kyber and employs a pipelined architecture to perform the processing. As a result. Our chip performs up to 8.5 times faster than a CPU and consumes 24.1 times less energy than a CPU. |
ID | Time | Title / Authors / Affiliation |
4.1 (7023) (Highlight) |
14:00 | 14:25 |
A 110-120-GHz, 12.2% Efficiency, 16.2-dBm Output Power
Multiplying Outphasing Transmitter in 22-nm FDSOI Jeff Shih-Chieh Chien, James F. Buckwalter University of California, Santa Barbara, USA Abstract: A multiplying outphasing transmitter based on reflection-type phase shifter and multiplier chain is fabricated in Global Foundries 22nm FDSOI CMOS process and the measured transmitter performance achieves 9.2-12.2% DC-to-RF efficiency with 15.1-16.2dBm output power at 110-120 GHz. |
4.2 (7027) |
14:25 | 14:50 |
A D-Band Packaged CMOS Integrated Transmitter for MUMIMO Applications Meng Wei1, Nima Baniasadi1, Ethan Chou1, Hesham Beshary1, Sashank Krishnamurthy2, Elad Alon1, Ali Niknejad1 1University of California, Berkeley, USA 2Intel, USA Abstract: This paper presents a D-band packaged CMOS integrated transmitter (TX) for Multi-User Multiple-Input-Multiple Output (MIMO) applications. The TX chip, fabricated using 28nm CMOS Bulk process, is packaged on an organic interposer including a patch antenna array. The circuit integrates the complete transmitter chain, including the baseband I/Q amplifiers, up-conversion mixers, power amplifier, and the LO distribution and generation. The designed TX achieves 9-10.6dBm EIRP at Psat , and it can support 24 Gbps 16-QAM and 24Gbps 64-QAM at 5.3pJ/bit efficiency, tested with over-the-air measurements. |
4.3 (7050) (Highlight) |
14:50 | 15:15 |
A Dual-Band 2×2 802.11ax Transceiver Supporting
160MHz CBW and 1024-QAM Chao Lu1, Shr-Lung (Calvin) Chen2, Jun Liu3, Jian Bao3, Yi Zhao3, Chin-Ming Chien2, Yufei Wang1, Jianqiu Chen3, Zexin Liao3, BingDing3, Bihui Zhu3, Jinhua Chen3, Pengfei Yue3, Ran Wang3, and Chun Wang3 1ASR Microelectronics Inc., USA 2ASR Microelectronics Inc., USA 3ASR Microelectronics Ltd., China Abstract: A 2×2 802.11ax transceiver design is presented to support dual band simultaneous operation (DBS) and 1024-QAM modulation. The proposed architecture features linearity enhancement for uplink OFDMA and wideband transmission. Best-in-class receiving sensitivity and lowest transmission EVM floor are demonstrated in measurements. With 20MHz (HE20) receiving, -96.5dBm/-66dBm sensitivity level is measured for MSC0/11, respectively. The output power reaches 18dBm with -35dB EVM for 80MHz 1024-QAM (HE80 MCS11) transmission at 5GHz band. Narrowband OFDMA signals can be transmitted at full power capacity, and 160MHz channel bandwidth (CBW) can also be supported without digital predistortion (DPD). The fully integrated transceiver occupies 10.5mm^2 silicon area in 22nm CMOS. |
4.4 (7141) |
15:15 | 15:40 |
A 32.2-38.2 GHz Broadband 4-Channel TRx Beamformer
with Embedded 3-Winding Transformer Based PA/LNA
FE and High Resolution Phase/Amplitude Control Yongjie Li1, Zongming Duan1, Xiao Li1, Chuanming Zhu1, Na Ding1, Yuefei Dai1, Liguo Sun2, Hao Gao3 1East China Research Institute of Electronic Engineering, China 2University of Science and Technology of China, China 3Eindhoven University of Technology, the Netherlands Abstract: This paper presents a 32.2-38.2 GHz broadband 4-channel Ka-band transceiver beamformer. In this transceiver (TRx) beamformer front-end (FE), a compact 3-winding-transformer achieves the Tx power combing and Rx noise matching simultaneously in the TDD mode. Furthermore, this 4-channel RF beamformer integrates a high precision 6-bit 360° phase shifter and 6-bit 0.5-dB step gain control in each channel for beam scanning accuracy improvement. With programmable 6-bit phase and 6-bit gain control, at 38 GHz, the measured 31.5-dB gain turning range is also with a 0.5-dB gain step and 5.6° phase step. With the TRx architecture, at 38 GHz, the measured Psat of Tx is 20.0-dBm, and the NF of Rx is 5.55-dB. |
ID | Time | Title / Authors / Affiliation |
5.1 (7198) (Highlight) |
14:00 | 14:25 |
A Synchronous-Sampling Impedance-Readout IC with
Baseline-Cancellation-Based Two-Step Conversion for
Fast Neural Electrical Impedance Tomography Ji-Hoon Suh1, Haidam Choi1, Yoontae Jung1, Sein Oh1, Hyungjoo Cho1, Nahmil Koo2, Seong Joong Kim2, Chisung Bae2, Sohmyung Ha3, and Minkyu Je1 1KAIST, Korea 2Samsung Advanced Institute of Technology, Korea 3New York University Abu Dhabi, United Arab Emirates Abstract: It was recently shown that electrical impedance tomography (EIT) with far enhanced frame rate can provide neural activity monitoring and functional localization of the active peripheral nerve at the same time. For the \'fast neural EIT\', we propose an EIT system employing successive-approximation-based (SA-based) baseline tracking and synchronous sampling (SS) of the ADC. By utilizing SA, the baseline can be tracked much faster than conventional incremental tracking. By using SS, only a single cycle of CG is required, enabling fast demodulation and thus allowing the use of low CG frequency. Thanks to these, even with the CG frequency of 18kHz, which is low enough to secure SNR for neural EIT, our work achieves maximum 500 fps which is about 4x higher than the state-of-the-art. |
5.2 (7120) |
14:25 | 14:50 |
A 1984-Pixels, 1.26nW/Pixel Retinal Prosthesis Chip with
Time-Domain In-Pixel Image Processing Dong-Hwi Choi and Dong-Woo Jee Ajou University, Korea Abstract: This paper presents 1984-pixel retinal prosthesis (RP) chip with in-pixel image processing. The proposed time-domain image processing circuits perform edge extraction by comparing the pulse widths generated by light-to-stimulus duration converters (LSDCs) of neighboring sensors. The pixel sequencing technique for the shared electrode operation is also proposed to increase the pixel count under the given chip area. The RP chip is implemented in 0.18 μm CMOS process and consumes 1.26 nW/pixel which is ×44.7 better than the previous state-of-the-art |
5.3 (7074) |
14:50 | 15:15 |
A 64-channel back-gate adapted ultra-low-voltage spikeaware neural recording front-end with on-chip
lossless/near-lossless compression engine and 3.3V
stimulator in 22nm FDSOI Franz Marcus Schüffny, Seyed Mohammad Ali Zeinolabedin, Richard George, Liyuan Guo, Annika Weiße, Johannes Uhlig, Julian Meyer, Andreas Dixius, Stefan Hänzsche, Marc Berthel, Stefan Scholze, Sebastian Höppner, Christian Mayr TU Dresden, Germany Abstract: In neural implants and biohybrid research systems, the integration of electrode recording and stimulation front-ends with pre-processing circuitry promises a drastic increase in real-time capabilities. In our proposed neural recording system, constant sampling with a bandwidth of 9.8kHz yields 6.73µV input-referred noise (IRN) at a power-per-channel of 0.34µW for the time-continuous ΔΣ-modulator, and 0.52µW for the digital filters and spike detectors. We introduce dynamic current/bandwidth selection at the ΔΣ and digital filter to reduce recording bandwidth at the absence of spikes. This is controlled by a two-level spike detection and adjusted by adaptive threshold estimation (ATE). Dynamic bandwidth selection reduces power by 53.7%, increasing the available channel count at a low heat dissipation. Adaptive back-gate voltage tuning (ABGVT) compensates for PVT variation in subthreshold circuits. This allows 1.8V input/output (IO) devices to operate at 0.4V supply voltage robustly. The proposed 64-channel neural recording system moreover includes a 16-channel adaptive compression engine (ACE) and an 8-channel on-chip current stimulator at 3.3V. |
5.4 (7123) |
15:15 | 15:40 |
A Heart-related Physiological Signal Monitoring SoC for
Wearable ECG Analysis Systems Peng-Wei Huang 1, Shuenn-Yuh Lee1, Chieh Tsou1, Yi-Wen Hung1, Po-Han Su1, Ju-Yi Chen2 1National Cheng Kung University, Taiwan 2National Cheng Kung University Hospital, Taiwan Abstract: This proposed configurable electrocardiogram (ECG) analysis system-on-chip (CEASoC) allows ECG monitoring and complex QRS detection and classification, thereby reducing the manpower requirements of the analysis. ECG analyses conducted by a person are effort- and time-consuming. Thus, an automatic ECG analysis device with a CEASoC and BLE module is necessary. This device can improve the healthcare environment through the convenience of instant detection. The burden of long-term care can then be relieved. Moreover, considering individual differences, the important analysis parameters in CEASoC can be updated using external devices and software to enhance the flexibility of the proposed system. |
ID | Time | Title / Authors / Affiliation |
6.1 (7053) (Highlight) |
10:50 | 11:15 |
A Single-Channel 14b 500 MS/s Pipelined-SAR ADC with
Reference Ripple Mitigation Techniques and AdaptiveBiased Floating Inverter Amplifier 1,2Wenning Jiang, 1Yan Zhu, 1Chi-hang Chan, and 1,3Rui Martins 1University of Macau, China 2Fudan University, Shanghai, China 3Universidade de Lisboa, Portugal Abstract: This paper presents a 14b 500MS/s single-channel pipelined-SAR ADC. An on-chip reference buffer is codesigned with reference ripple neutralization (RRN) and cancellation (RRC) in the first stage to facilitate a fast conversion at low power. An adaptive-biased floating inverter amplifier (AB-FIA) is introduced to enhance the gain, linearity and speed. Consuming 6.34mW (included reference buffer), the achieved SNDR and SFDR are 64.2dB and 80.55dB at Nyquist input, respectively. The ADC achieves 170.2dB Schreier FoM and 9.6 fJ/conversion-step Walden FoM at Nyquist input. |
6.2 (7190) |
11:15 | 11:40 |
A 3.07mW 30MHz-BW 73.5dB-SNDR Time-Interleaved
Noise-Shaping SAR ADC with 2nd
-order ErrorFeedforward and Redundancy-Bit Reduction Shulin Zhao1, Mingqiang Guo1, Sai-Weng Sin1,2, Liang Qi3, Dengke Xu4, Guoxing Wang3, Rui P. Martins1,5 1University of Macau, China 2Zhuhai UM Science & Technology Research Institute, China 3Shanghai Jiao Tong University, China 4Amicro Semiconductor Co., Ltd, China 5University of Lisboa, Portugal Abstract: This work presents a calibration-free 2-channel time-interleaved noise-shaping SAR (TI-NS-SAR) with 1) one-time midway error-FB and a shared dynamic amplifier to reduce the redundancy bit; 2) the 2nd-order error-feedforward to enhance NS effect for higher resolution. Fabricated in 28nm CMOS, the prototype achieves 73.5dB-SNDR and 30MHz-BW with a sampling frequency of 330MHz. It consumes 3.07mW, resulting in an FoMs of 173.4dB. |
6.3 (7095) |
11:40 | 12:05 |
A 12b 8GS/s Time-Interleaved 2b/cycle Pipelined-SAR
ADC with Layout-Customized Bootstrap and SuperSource-Follower Based Open-Loop Residue Amplifier Qiang Yu1,2, Jie Pu1, Jian Luo1, Zhengbo Huang1, Junhong Wu1, Xing Zhu1, Feixiang Xiang1, Lei Chen1, Jianwen Li1, Qiang Li2, Jinda Yang1, and Yuanjun Cen1 1Chengdu Sino Microelectronics Technology, China 2University of Electronic Science and Technology of China, China Abstract: This work describes a 12b 8GS/s time-interleaved ADC which utilizes a 2b/cycle pipelined-SAR ADC in each channel to enhance the speed while maintaining low power. To sample the input signal within 125ps, a layout-customized bootstrap is proposed to accelerate the start-up time. A high-linearity super-source-follower (SSF) based open-loop residue amplifier (RA) with large input swing and strong output power is exploited. With Nyquist input, this 8GS/s ADC achieves a SNDR of 53.8dB and a SFDR of 67dB with a power dissipation of 1W. |
6.4 (7225) |
12:05 | 12:17 |
A 6-bit 5.12-GS/s Flash ADC with Track-and-Hold Embedded
Dynamic Preamplifier in 28nm CMOS Daesik Moon1,2, Sangwoo Lee3, Taewoong Kim1, Woo-Young Choi1, and Youngcheol Chae1 1Yonsei University, Korea 2Samsung Electronics, Korea 3Robert Bosch LLC., USA Abstract: 5.12gs/s flash adc with track-and-hold embedded dynamic preamplifier. x4 interpolated pipelined amplifier followed by strong-arm latch. above 32.96db over different input frequencies and sampling frequencies. foreground calibration is realized. |
6.5 (7213) |
12:17 | 12:30 |
A 7-Bit 4-GS/s Quad-Channel Time-Interleaved SAR ADC
With 2-Then-1-Bit/Cycle Conversion Jihyun Baek, Jonghyun Kim, Gyuchan Cho, Jintae Kim, and Hyungil Chae Konkuk University, Korea Abstract: A 7-bit 4-GS/s quad-channel TI-SAR ADC including the front-end sampler and the buffer is presented. The channel ADC speed is maximized by 2-then-1-bit/cycle coarse-fine conversion without calibration. Also, a buffer topology for unity gain is introduced. The prototype is implemented in a 28-nm CMOS process, and it shows an SNDR of 38 dB at a 4 GS/s sampling rate. The power consumption is 11.4 mW, and the Walden FoM is 43.8 fJ/conv.-s showing good energy efficiency. |
ID | Time | Title / Authors / Affiliation |
7.1 (7254) (Highlight) |
10:50 | 11:15 |
A 75.6M Base-pairs/s FPGA Accelerator for FM-index
Based Paired-end Short-Read Mapping Chung-Hsuan Yang1, Yi-Chung Wu1, Yen-Lung Chen1, Chao-Hsi Lee2, Jui-Hung Hung2,3, Chia-Hsiang Yang1,2 1National Taiwan University, Taiwan 2GeneASIC Technologies Corp., Taiwan 3National Yang Ming Chiao Tung University, Taiwan Abstract: This work presents an FPGA accelerator for FM-index based paired-end short-read mapping in NGS data analysis realized on a AMD-Xilinx Alveo U250 FPGA board. With the proposed design techniques, the overall latency is reduced by 92.6%. This work delivers a 1.7-18.6x higher throughput with memory-efficient implementation and achieves the highest 99.3% accuracy, when compared to the state-of-the-art FPGA-based designs. On-site FPGA demonstration will be made. |
7.2 (7152) |
11:15 | 11:40 |
A 217.8 MSOPs/W FPGA-based Online Learning SNN
Processor Using Unified Event-Driven Structure and
Topology Aware Data Reuse Strategies Chaoming Fang1,2, Fengshi Tian2, Chuanqing Wang2, Jie Yang2, Mohamad Sawan2 1Zhejiang University, China 2CenBRAIN Neurotech, Westlake University, China Abstract: We present in this paper a reconfigurable algorithmic neuromorphic engine (RAINE) with three innovative features: 1) A Pipelined-Event-Driven (PED) architecture to increase SNN execution efficiency by leveraging input sparsity. 2) A Topology-Adaptive-Stationary (TAS) data reuse strategy to reduce memory access by adopting Voltage-Reuse (VR), Event-Reuse (ES), and Synapse-Reuse (SR) dataflow for different topologies and 3) A Unified-Dynamic-Learning-Engine (UDLE) to carry out computation for both Leaky-Integrate-Fire (LIF) and trace-based Spike-Timing-Dependent-Plasticity (STDP) online learning. RAINE shows competitive energy efficiency of 217.8 MSOPS/W at a clock frequency of 75MHz, without causing additional hardware resource overhead due to the compact and unified circuit design. |
7.3 (7187) |
11:40 | 12:05 |
A Flexible Instruction-based Post-quantum Cryptographic Processor with Modulus Reconfigurable
Arithmetic Unit for Module LWR&E Aobo Li, Dongsheng Liu, Xiang Li, Tianze Huang, Shuo Yang, Jiahao Lu, Ang Hu Huazhong University of Science and Technology, China Abstract: In this work, we proposed a reconfigurable arithmetic unit with variable modulus domain, and combined with custom instruction-set architecture to design a flexible crypto processor for MLWR and MLWE. Verified on the FPGA platform, the work achieved the flexible implementation of variable parameters and instruction programming under the strategy of resource efficiency and performance trade-off. |
7.4 (7100) |
12:05 | 12:30 |
Method of Halved Interaction Elements with Regularity
Arrangement that achieves Independent Double Systems
for Scalable Fully Coupled Annealing Processing Shinjiro Kitahara, Akari Endo, Taichi Megumi, and Takayuki Kawahara Tokyo University of Science, Katsushika, Japan Abstract: In recent years, annealing processors have been developed as solutions to large-scale combinatorial optimization problems. In this paper, we propose a new method that has a high affinity with a scalable fully coupled annealing processor and halves the interaction in which there are squares of spins with sequence regularity. In addition, we succeeded in implementing two independent 384-spin fully-coupled Ising machines with 16 chips. The usefulness of the reduction plan is shown. |
ID | Time | Title / Authors / Affiliation |
8.1 (7214) (Highlight) |
10:50 | 11:15 |
A 37-39GHz Phase and Amplitude Detection Circuit with
0.060 degree and 0.043dB RMS Errors for the Calibration
of 5GNR Phased-Array Beamforming Yudai Yamazaki, Jun Sakamaki, Jian Pang, Joshua Alvin, Zheng Li, Atsushi Shirane, Kenichi Okada Tokyo Institute of Technology, Japan Abstract: Phased-array beamforming is achieved by the high-resolution phase and amplitude controls in each TRX element. However, the on-chip mismatches caused by PVT variations between each element degrades the phased-array performance. In this work, a phase and amplitude high-accuracy detection circuit for phased-array mismatch calibration in 39GHz bands is introduced. Phase-to-digital converter (PDC) and analog-to-digital converter (ADC) detection technique is applied for much lower detection errors than conventional. The proposed detection circuit achieves phase and amplitude detection in 37-39GHz with 0.046 degree and 0.043dB RMS errors, respectively. The core area is 1.34mm^2, which is fabricated in a 65nm CMOS process. |
8.2 (7074) |
11:15 | 11:40 |
A 0.55mm2
16.9mW Fully Integrated 0-to-200MHz System
BW Wireless Direct Sampling Receiver in 14nm FinFET Ilhoon Jang, Barosaim Sung, Jaehoon Lee, Soonwoo Choi, Byoungjoong Kang, Suseob Ahn, Kyungmin Lee, Taejin Jang, Kwangmin Lim, Anna Yu, Yong Lim, Seunghyun Oh, and Jongwoo Lee Samsung Electronics, Korea Abstract: This paper presents a fully-integrated wireless direct sampling receiver that covers from DC to 200MHz system bandwidth implemented with a single-channel SAR ADC in 14nm FinFET. To demonstrate the proposed architecture, frequency modulation (FM) among the applicable standard frequency bands is adopted as a prototype. The measured demodulated SNR is 73.9dB with -47dBm input power at 108MHz and the sensitivity level is -106dBm. The proposed direct sampling receiver shows a robust performance over a 30dB demodulated SNR even in the presence of the interference such as a strong adjacent channel and an in-band spur. Furthermore, the FM channel scan time is drastically reduced since the proposed receiver simultaneously samples all channels without adjusting analog building blocks. |
8.3 (7124) |
11:40 | 12:05 |
An n79 Sub-1-dB Noise Figure Highly Linear VariableGain LNA Employing Adaptive Imbalanced Bleeding for
5G NR Jinglong Xu1, Keun-Mok Kim1, Hafiz Usman Mahmood1, Jusung Kim2, Sang-Gug Lee1 1KAIST, Korea 2Hanbat National University, Korea Abstract: This work presents a 5G n79 sub-1-dB NF highly-linear variable-gain LNA. Three key techniques are introduced: (i) Imbalanced current bleeding for a wide gain range, (ii) Drain-side DC current switching for low power operation (iii) bleeding with an adaptive biasing scheme for linearity improvement. The proposed LNA shows a peak gain of 20.5 dB with a 0.74 dB minimum NF, with a wide gain range of 13.4 dB while reducing the power to 4.2 mW at the lowest gain mode. As a result, the proposed LNA achieves the best FoM1 among reported LNAs working at 4-6 GHz. |
8.4 (7232) |
12:05 | 12:30 |
A 24GHz CMOS UWB Radar IC with IQ Correlation
Receiver for Short Range Human Detection Dongwuk Park1,2, Byeongjae Seo1, Kiryun Byeon1, Gu Jung2, andYunseong Eo1,2 1Kwangwoon University, Korea 2Silicon R&D, Corp., Korea Abstract: A fully integrated 24 GHz UWB radar IC is presented. The IQ correlation receiver is employed for the detection fidelity and range extension. The transmitter is a VCO based impulse generator. The carrier frequency and bandwidth of the UWB signal can be tunable in the range of 22.9 - 25.5 GHz and 0.18 - 3 GHz, respectively. The equivalent sample resolution is 195 ps. The radar module using IC provides the maximum detection range for moving human up to 12.5 m within 120.27 mW power consumption. |
ID | Time | Title / Authors / Affiliation |
9.1 (7154) (Highlight) |
10:50 | 11:15 |
DSC-TRCP: Dynamically Self-calibrating Tunable Replica
Critical Paths Timing Monitoring for Variation Resilient
Circuits with Low Cost & Large Power/Frequency Gain Zhengguo Shen, Weiwei Shan*, Yuxuan Du, Ziyu Li, Chengjun Wu, Jun Yang Southeast University, China Abstract: In-situ timing monitoring based adaptive voltage scaling (AVS) eliminates the excess timing margin for digital circuits but suffers from miss detection risk. Indirect monitoring methods face difficulties in the calibration of the replica circuit and its discrepancy with the actual circuit which limits its gain. We propose a dynamically self-calibrating tunable replica critical paths (DSC-TRCP) based timing monitoring method, which integrates the advantages of both in-situ and indirect monitoring methods while conquering their disadvantages. Implemented in a 28nm CMOS technology, it achieves up to 58% power gain or 232% frequency gain with only 0.65% area cost. |
9.2 (7208) |
11:15 | 11:40 |
C3MLS: A 0.12-nW Leakage and 18.11-fJ/Transition Level
Shifter With Cross-Coupled and Current Mirror Hybrid
Structure for Ultra-Wide Range Level Conversions Cong Huang and Hailong Jiao* Peking University, China Abstract: In this paper, a CCLS (cross-coupled level shifter)/CMLS (current mirror level shifter) hybrid level shifter, C3MLS, is proposed for ultra-wide range level conversions from extremely low voltage deep in the subthreshold region to nominal supply voltage. By maintaining the merits of CCLS and CMLS and utilizing them to kill the drawbacks of each other, the proposed C3MLS achieves limited-current-contention and nearly static-current-free conversions. Measurement results in 55-nm technology demonstrate that the proposed level shifter exhibits the lowest energy-delay product among the state of the art and an average static power consumption of 0.12 nW @ VDDL = 0.3 V. |
9.3 (7117) |
11:40 | 12:05 |
A 0.0043-mm2 Capacitorless External-Clock-Free FullySynthesizable Digital LDO Using Load-Direct Droop
Detector and Time-Based Load-State Decision Jonghyun Oh1, Yoonho Song2, Young-Ha Hwang3, Jun-Eun Park4, Mingoo Seok1, and Deog-Kyoon Jeong2 1Columbia University, USA 2Seoul National University, Korea 3Soongsil University, Korea 4Chungnam National University, Korea Abstract: The proposed fully-synthesizable DLDO determines a load state using a single CMP, a single voltage reference, and a tunable delay line without an external clock, resulting in having an 99.6% current efficiency in a 0.6-V supply voltage. Besides, a 5-ns settling time from a 98-mV voltage droop is achieved using a coarse controller and a load-direct droop detector. The DLDO offers a 0.0043-mm2 chip area and 13.01-A/mm2 current density thanks to the fully-synthesized capacitorless design. The DLDO exhibits the best FoM2 compared with prior arts that includes a performance for settling time. |
9.4 (7168) |
12:05 | 12:17 |
A 10-Gbps, 0.121-pJ/bit, All-Digital True Random-Number
Generator using Middle Square Method Jonghyun Kim and Hyungil Chae Konkuk University, Korea Abstract: A robust and all-digital true random number generator (TRNG) with high throughput and good power efficiency is presented. A modified middle square method for post- processing converts a 1-bit comparator output to an 8-bit random stream to achieve 10Gbps throughput. The proposed TRNG achieves the highest throughput as well as the best power efficiency of 0.121pJ/bit among all NIST test-suite adaptable TRNGs. |
9.5 (7157) |
12:17 | 12:30 |
A Variation-Tolerant Differential Contention-Free Pulsed
Latch with Wide Voltage Scalability Gicheol Shin, Minhyeok Jeong, Donguk Seo, Shin Han, Yoonmyung Lee Sungkyunkwan University, Korea Abstract: A differential contention-free pulsed latch (DCPL) is proposed, targeting wide voltage range scalability (1V to 0.4V). In order to operate in near threshold-voltage (NTV) region, differential latch structure is combined with dynamic XOR while staying static and contention-free, using special header/bridge structure. Also, in order to decrease the number of transistors and power consumption, pulse generator is absorbed into D-latch using blockages controlled by delayed clock and dual bridge structure. The proposed DCPL operates as reliably as TGFF at NTV region, and shows 50% improvement in sequencing time compared to TGFF, while maintaining similar hold time compared to prior-arts pulsed latches. |
ID | Time | Title / Authors / Affiliation |
10.1 (7132) (Highlight) |
14:00 | 14:25 |
A Process-Scalable Ultra-Low-Voltage 180kHz Sleep
Timer with a Time-Domain Amplifier and a Switch-less
Resistance Multiplier Chongsoo Jung1, Hoyong Seong1, Injun Choi1, Sohmyung Ha2, and Minkyu Je1 1KAIST, Korea 2New York University Abu Dhabi, United Arab Emirates Abstract: This paper presents a process-scalable on-chip sleep timer. Our sleep timer overcomes the limitations of conventional on-chip sleep timers by using a combination of ultra-low-voltage (ULV) frequency-locked-loop (FLL) architecture, and a time-domain amplifier (TDA), and a gate-leakage-leveraging technique. The proposed design, fabricated in a 65nm CMOS, produces a 180kHz frequency and achieves 2.73ppm/°C temperature dependency with calibration based on a lookup table (LUT) while consuming 61nW at 0.4V supply. |
10.2 (7068) |
14:25 | 14:50 |
A sub-0.5V Crystal Oscillator-Timer (XO-Timer) Combining
16MHz Reference and 32kHz Sleep Timer with a Single
Crystal for Energy-Harvesting Radios in 28nm CMOS Liwen Lin1, Ka-Meng Lei1, Pui-In Mak1, Rui P.Martins1,2 1University of Macau, China 2Universidade de Lisboa, Portugal Abstract: This paper reports an ultra-low-voltage (ULV) single-crystal oscillator-timer (XO-Timer) for sub-0.5 V BLE radios. Specifically, we propose a cascaded charge-pump (CP) as the micropower manager (μPM) to customize the voltage and current budgets for each XO-Timer sub-function. Such μPM shows a higher power efficiency than the non-cascaded design and features a single voltage-regulation loop to uphold the performance of the XO-Timer against VT-variations. The XO-Timer\'s core amplifier innovates an ULV reconfigurable-gm topology to balance the power budget and performance under the high-performance mode (HPM) and low-power mode (LPM). Fabricated in 28-nm CMOS, the XO-Timer in HPM generates a 16-MHz clock with a power of 24.3 μW, and a phase noise of −133.8 dBc/Hz at 1-kHz offset. In the LPM, a 32.258-kHz clock is delivered while consuming 11.4 μW. The sleep-timer FoM2 is 14.8 μW and the Allan deviation is 35.1 ppb, achieving the lowest supply voltage (0.25 V) not only for a dual-mode XO-Timer but also for a MHz-range XO. |
10.3 (7077) |
14:50 | 15:15 |
A 0.63-mm2/Ch 1.3-mΩ/√Hz-Sensitivity 1-MHz
Bandwidth Active Electrode Electrical
Impedance Tomography System Ting Zhou, Hui Li, Jiajie Huang, Chao Wang, Qianyu Guo, Junyan Liu, Zhiwen Gu, Yang Zhao, Jian Zhao, Mingyi Chen, Yan Liu, Guoxing Wang, Yong Lian, Yongfu Li* Shanghai Jiao Tong University, China Abstract: AE-EIT 2D system is presented using 1) direct IF down-conversion, and digitally switched SRDP I/Q demodulation technique with low power circuit techniques to improve the impedance resolution to 1.3mΩ/√Hz at 100kHz and reduce the variation of readout circuit 0.44mVpp (4.44×) while achieving the smallest area per channel of 0.63mm2 (1.38×-6.6×). |
10.4 (7189) |
15:15 | 15:40 |
A 1.7-6.4 GHz fourth-order RF filter with 1-40% fractional
bandwidth in 22-nm FDSOI Iman Ghotbi, Baktash Behmanesh, and Markus Törmänen Lund University, Sweden Abstract: This paper presents a fourth-order Q-enhanced RF filter featuring gm-boosting, noise-canceling, capacitive cross-coupling, and forward body-biasing techniques to realize 1.7 to 6.4 GHz operating range and up to 40% adjustable fractional bandwidth. The filter operates based on subtracting out-of-phase signals in the passband and in-phase signals in the stopband. Two Q-enhanced LC resonators are utilized for outphasing. Fabricated in 22 nm FDSOI, the chip achieves 4.6 dB NF, -14 dBm IB-IIP3, and 26 dBm IB-IIP2 at 4 GHz while drawing 22-45 mA from a 1 V supply. Fourth-order steep roll-off results in 17 dBm OOB-IIP3 at 2×BW frequency offset. |
ID | Time | Title / Authors / Affiliation |
11.1 (7061) (Highlight) |
14:00 | 14:25 |
A 28nm Hybrid 2T1R RRAM Computing-in-Memory Macro
for Energy-efficient AI Edge Inference Wang Ye1,3, Chunmeng Dou1,3, Linfang Wang1,3, Zhidao Zhou1,3, Junjie An1, Weizeng Li1,3, Hanghang Gao1,3, Xiaoxin Xu1,3, Jinshan Yue1, Jianguo Yang1,3, Jing Liu1,3, Dashan Shang1,3, Jinghui Tian2, Qi Liu1,2, Ming Liu1,2 1Institute of Microelectronics of the Chinese Academy of Sciences, China 2Fudan University, China 3University of Chinese Academy of Sciences, China Abstract: This work presents the first 28nm hybrid 2T1R (H2T1R) RRAM computing-in-memory macro for AI edge inference. It features (1) the H2T1R cell array that can achieve >13X enhanced resistance-ratio, >80% reduced summation current, >67% smaller word-line voltage, and precise multi-bit weight encoding, and (2) reference-subtracting current sense amplifier (RS-CSA) that can reduce the number of the stand-by reference signals and extend the linear dynamic range of the current mirror. It performs highly accurate multi-bit analogue computation over 32 input channels with a peak energy efficiency up to 154.04 TOPS/W. |
11.2 (7167) (Highlight) |
14:25 | 14:50 |
A Local Transpose 9T SRAM Compute-In-Memory Macro
with Programmable Single-Slope SAR ADC Xin Zhang*1, Yongjun Jo*1, Jiahao Liu2, Jun Zhou2, Yuanjin Zheng1, and Tony Tae-Hyoung Kim1 (*Equally contributed authors) 1Nanyang Technological University, Singapore 2University of Electronic Science and Technology of China, China Abstract: This work proposes a two-directional transpose SRAM compute-in-memory (CIM) macro for inference and training in convolutional neural networks (CNN). A novel 9T SRAM bit-cell is proposed for local two-way computing without additional shared transpose processing units. The proposed transposable CIM achieves higher processing throughput from every bit-cell being able to operate at the same time in one CIM computing cycle. This work also proposes a programmable single-slope (SS) successive approximation (SAR) ADC for energy efficiency improvement by utilizing the probability density function of MAC values. The proposed ADC also supports the ReLu-based zero skip function by the SS operation. The test chip was fabricated by 180nm CMOS technology and achieved an energy efficiency of 6.61TOPS/W with the ADC zero-skip and SS operations. |
11.3 (7203) |
14:50 | 15:15 |
Spike-CIM: A 290TOPS/W Spike-Encoding SparsityAdaptive Computing-in-Memory Macro with Differential
Charge-Domain Integrate-and-Fire Jiahao Song1, Xiyuan Tang1, Haoyang Luo1, Kuan Xu2, Yuan Wang1, Zhigang Ji2, Runsheng Wang1, and Ru Huang1 1Peking University, China 2Shanghai Jiao Tong University, China Abstract: This paper proposes a spike-encoding sparsity-adaptive computing-in-memory (CIM) macro (Spike-CIM) that offers excellent energy efficiency and robustness. A differential integrate-and-fire architecture, implemented by charge-domain cells, is proposed to achieve sparsity-adaptive power saving. The fabricated 65nm 32Kb Spike-CIM realizes a normalized energy efficiency of 1218 TOPS/W/Bit. |
11.4 (7245) |
15:15 | 15:27 |
A Hybrid Temperature Compensation method combined with
Digital and Analog Temperature Compensation Techniques for
3D-NAND Flash Memories Dojeon Lee, Junhong Park, Philkyu Kang, Sungmin Jo, Seheon Baek, Chi-Weon Yoon, Dongku Kang Samsung Electronics, Korea Abstract: The voltage compensation methods according to the temperature change can be typically divided into a digital method and an analog method. This paper proposes the hybrid temperature compensation method that combines the advantages of the Digital method and the Analog method to secure temperature linearity and reduce time overhead for temperature sensing. |
11.5 (7160) |
15:27 | 15:40 |
A Variation-Tolerant Processing-In-Memory Architecture
Using Discharging Current Calibration Daiki Kitagata, Shinji Tanaka, Naoya Fujita and Naoaki Irie Renesas Electronics Corporation, Japan Abstract: This paper presents a variation-tolerant ternary neural arithmetic memory (VT-TNAM) for energy-efficient processing-in-memory (PIM) accelerators. The VT-TNAM macro installs the newly proposed discharging current calibration (DCC) architecture using adjustable-current ternary bit cells (ACTBCs) to effectively mitigate local process variation. Furthermore, hierarchical MAC-operation skipping (HMS) architecture using the proposed small current detector (SCD) is also developed to compensate for energy efficiency degradation caused by MAC accuracy improvement. Successful reduction of process variation is verified using a fabricated test-element-group (TEG) in 22nm process and 20.0 – 59.2 TOPS/W is achieved by introducing the HMS architecture. |
ID | Time | Title / Authors / Affiliation |
12.1 (7051) (Highlight) |
14:00 | 14:25 |
A 103 fJ/b/dB, 10-26 Gbps Receiver with a Dual Feedback
Nested Loop CDR for Wide Bandwidth Jitter Tolerance
Enhancement Yao-Chia Liu1, Wei-Zen Chen1, Yuan-Sheng Lee2, Yu-Hsiang Chen2, Shawn Min2, Ying-Hsi Lin2 1National Yang Ming Chiao Tung University, Taiwan 2Realtek Semiconductor Corp., Taiwan Abstract: A Nested CDR based Receiver with PI controller is presented. The direct modulation jumps over the loop latency limited PI path and modulate VCO for faster response and enhance the stability. The measured jitter tolerance curve shows 0.15UI enhancement at 60MHz, while DFE is simplified by edge based algorithm the receiver is able to tolerate 32dB channel loss. For CDR only prior art , this work improves twice more than traditional PI architecture and four times more than DCO architecture in term of power efficiency. |
12.2 (7026) (Highlight) |
14:25 | 14:50 |
A 42Gb/s PAM-8 Transmitter with Feed-Forward
Tomlinson-Harashima Precoding in 28nm CMOS Byungjun Kang, Woosong Jung, Hyojun Kim, Sanghee Lee, and Deog-Kyoon Jeong Seoul National University, Korea Abstract: A 42Gb/s PAM-8 transmitter (TX) with feed-forward Tomlinson-Harashima precoding (FF-THP) is presented. The FF-THP architecture produces a uniform output distribution with higher average signal power compared with the FFE. The fabricated chip compensates for the 7.7dB channel loss with the PAM-8 signaling. As a result, it achieves the power efficiency of 1.58pJ/b, occupying 0.0703mm2. |
12.3 (7228) |
14:50 | 15:15 |
A 11.4-Gbps/lane MIPI 32-bit C-PHY and D-PHY combo
transmitter with 3-tap FFE Junhan Bae1, Myeongkyu Song1, Bongkyu Kim1, Junkyu Lee1, Woosung Park1,2, and Jung-Hoon Chun1,3 1Sungkyunkwan University, Korea 2Samsung Electronics, Korea 3SolidVue, Korea Abstract: This paper describes a MIPI C/D-PHY combo transmitter (TX) fabricated in 110nm CMOS image sensor (CIS) process. The same hardware can be shared to support both C-PHY and D-PHY with little extra circuitry. The adopted 32-bit architecture that enables double data rate (DDR) in C/D-PHY can maximize the data rate, allowing it to exceed the limits of legacy sub-micron process technologies. In addition, the proposed TX utilizes 3-tap feed-forward equalization (FFE) in both the C-PHY and D-PHY modes, effectively eliminating the inter-symbol interference (ISI) induced by band-limited channels. The measured results indicate that the compliance test verified in C-PHY mode is comfortably passed at data rates up to 11.4 Gbps (5 Gsps) per lane. The eye diagrams in D-PHY mode are fully open at the data rates up to 6 Gbps per lane. |
12.4 (7226) |
15:15 | 15:40 |
A 5.0-to-12.5-Gb/s, 1.7-pJ/b, 0.66-µs Lock-time Referenceless Sub-sampling CDR with Beat Detection FLL in 28nm CMOS Woosung Park1, 2, Jahoon Jin2, Minsu Park1, Sangdon Jung1, 2, and Jung-Hoon Chun1, 3 1Sungkyunkwan University, South Korea 2Samsung Electronics, South Korea 3SolidVue, South Korea Abstract: The capture range of the SSPD is wider than that of the PD, relieving the burden of reducing the residual frequency. In practice, the SSPD-based CDR (SSCDR) in [1] corrects frequency errors without an FLL, saving significant power. The SSCDR also achieves short lock-time with a wide bandwidth; therefore, it is suitable for the burst-mode operation which requires a sub-ns relocking time. To take advantage of these desirable characteristics of the SSCDR, this work benchmarks [1] and extends the frequency coverage by employing the beat detection FLL. The proposed FLL shows faster-locking behavior than prior arts through a beat correction process using the down-conversion function of the SSPD. As a result, the proposed FLL relieves a trade-off between lock time and frequency coverage. We also propose a bandwidth-control technique and an energy-efficient dual-mode SSPD. |
ID | Time | Title / Authors / Affiliation |
13.1 (7025) (Highlight) |
14:00 | 14:25 |
A 20-MHz 2.3-mW Receiver and a 25-V Transmitter for
Ultrasound Capsule Endoscopy Kyeongwon Jeong1, Jaesuk Choi1, Gichan Yun1, Injun Choi1, Jeehoon Son2, Jae Youn Hwang2, Sohmyung Ha3, and Minkyu Je1 1KAIST, Korea 2DGIST, Korea 3New York University Abu Dhabi, United Arab Emirates Abstract: We proposed firstly ultrasound capsule endoscopy (USCE) ASIC. An on-chip transmitter (TX) is designed to generate a high voltage pulse applied in the transducer. In addition, a highly power-efficient ultrasound (US) receiver (RX) IC for US capsule endoscopy (USCE) is presented. We propose a RX structure with synchronized analog envelope detection (ED) to reduce the required ADC speed. A ping-pong noise-shaping SAR (NS-SAR) ADC with a passive gain is employed for high power efficiency and resolution. |
13.2 (7159) |
14:25 | 14:50 |
An Intra-Body-Power-Transfer System with a PLL-based
Continuous Maximum Resonant Power Tracking Loop at
TX and 1.8V DC Output Voltage at RX Hyungjoo Cho1, Ji-Hoon Suh1, Gichan Yun1, Sohmyung Ha2, and Minkyu Je1 1KAIST, Korea 2New York University Abu Dhabi, United Arab Emirates Abstract: We present an intra-body-power-transfer (IBPT) system that delivers power greater than 100μW even across 150cm on-body distance. The proposed IBPT TX employs a PLL-based maximum-resonant-power-tracking (MRPT) loop running in the background to maximize the power delivered to the load (PDL) without any need for RX-to-TX back telemetry or tuning phase, enabling continuous power delivery. The PDL and power transfer efficiency (PTE) are further improved by inducing parallel resonance at RX. Fabricated in a 180nm BCD process, the IBPT system achieves 136μW PDL at 1.8V DC output with 8.83% end-to-end power efficiency. |
13.3 (7163) |
14:50 | 15:15 |
A 2m-Range 711uW Body Channel Communication
Transceiver Featuring Dynamically-Sampling Bias-Free
Interface Front End Guanjie Gu1, Changgui Yang1, Zhuhao Li1, Xiangdong Feng1, Ziyi Chang1, Ting-Hsun Wang1, Yunshan Zhang1, Yuxuan Luo1, Hong Zhang1, Ping Wang1, Sijun Du2, Yong Chen3, and Bo Zhao1* 1Zhejiang University, China 2Delft University of Technology, Netherlands 3University of Macau, China * Corresponding Author: Bo Zhao ([email protected]) Abstract: The state-of-art BCC transceivers have realized low power consumption, but the communication range is still limited to less than 1m. One of the issues limiting the communication range of BCC is the loss at the interface between human body and transceiver. The DC bias in previous closed-loop and gate-input techniques reduced the input impedance and voltage gain of IFE, leading to a high interface loss. In this work, we propose a dynamically-sampling bias-free IFE to realize a 90KOhm input impedance and 94dB RF-IF conversion gain of IFE, resulting in a receiving sensitivity of -104dBm. Therefore, the communication range has been extended to 2m with 711uW total power consumption. |
13.4 (7169) |
15:15 | 15:40 |
A Low-power Sleep Apnea Monitoring IC with a Duty-Recovered Body Channel Communication Receiver Pangi Park, Donghyeok Cho, SeongHwan Cho KAIST, Korea Abstract: This paper presents an in-home level-4 sleep apnea monitoring IC that can measure three basic parameters such as airflow, HR, and SpO2. A duty-recovered BCC receiver is proposed to allow the both transmitter and receiver side can be duty-cycled, and the power efficiency of the readouts is improved by regulating the voltage of the interface node of sensing units and readouts. With the proposed techniques, the receiver power is reduced by 98.8%, and the overall system power is 93.8% smaller than the previous work. |
ID | Time | Title / Authors / Affiliation |
14.1 (7085) |
09:00 | 09:25 |
A Digital LDO in 22nm CMOS with a 4b Self-triggered
Binary Search Windowed Flash ADC Featuring Automatic
Analog Layout Generator Framework Xiaosen Liu1,2, Soner Yaldiz2, Parijat Mukherjee2, Steven Burns2, Harish Krishnamurthy2, Krishnan Ravichandran2, Zakir Ahmed2, Nachiket Desai2, Nicolas Butzen2, James Tschanz2, Vivek De2 1Tsinghua University, School of Integrated Circuits, China 2Intel Corporation, U.S.A Abstract: An analog layout generator based DLDO with a self-triggered binary search windowed flash ADC is proposed in 22nm CMOS to maximize the productivity of implementing analog circuit blocks in scaled CMOS process, thus significantly improving the physical design time & effort up to 60× compared with conventional manual approach. A self-triggered binary search mechanism with a delay-based architecture is proposed to reduce the exponentially growing kickback noise and energy consumption of a traditional flash ADC down to the level of a SAR ADC while maintaining its high speed feature. The DLDO features 3.55ps FoM and fully automatic generation. |
14.2 (7086) |
09:25 | 09:50 |
A Fast-Transient and Wide-Range Output Capacitor-Less
NMOS LDO Regulator with Adaptive-Gain Nested Miller
Compensation and Pre-Emphasis Inverse Biasing Hyunjun Park, Woojoong Jung, Minsu Kim, and Hyung-Min Lee Korea University, Korea Abstract: The proposed capless LDO can ensure stability at a wide load range as well as achieve higher bandwidth for fast transient at larger ILOAD by adopting an adaptive-gain nested Miller compensation. A pre-emphasis inverse biasing also improves slew rate at the gate of an NMOS pass transistor by sourcing adaptive bias current into a super source follower. The 180nm CMOS LDO acquires high unity-gain bandwidth of 17.5MHz while providing a wide ILOAD range from 0.1mA to 300mA with phase margin above 60°. The LDO ensures small undershoot (48mV) and overshoot (59mV), achieving best FoM of 1.72ps. |
14.3 (7144) |
09:50 | 10:15 |
A Capacitor-less Digital LDO using Ripple-FrequencyAdaptive Time-domain Digital Pre-distortion Technique Angxiao Yan1, Wei Deng1,2, Haikun Jia1, Shiwei Zhang1, Rui Wu3, Zhihua Wang1,2, and Baoyong Chi1 1singhua University, China 2Research Institute of Tsinghua University in Shenzhen, China 3National Key Lab of Microwave Imaging Technology, AIR, CAS, China Abstract: A Digital low-dropout regulator (D-LDO) with time-domain digital pre-distortion (DPD) scheme is introduced in this paper. It features adaptive suppression of supply voltage ripple without introducing analog-assisting loop or large capacitor. The proposed all-digital ripple cancellation technique is effective against arbitrary ripple waveforms and any ripple frequency from kHz to a quarter of the clock frequency. The measurement results indicate a -24.5 dB rejection ratio and an improvement of 9.5 dB over the conventional D-LDO. This work demonstrates the possibility and feasibility of digital-domain ripple cancellation for the first time. |
14.4 (7072) |
10:15 | 10:40 |
A Self-Clocked TDC-Based Unified Clock and Voltage
Regulator with Replica Frequency-Locked Loop and
Hysteresis Switching in 65nm CMOS Xuliang Wang, Wing-Hung Ki, and Philip K. T. Mok The Hong Kong University of Science and Technology, China Abstract: A self-clocked digital low-dropout regulator (DLDO) employing a tunable replica oscillator (TRO) and a beat-frequency (BF) quantizer is proposed to supply and clock the microprocessors. The standard D-flip-flop is utilized as both the time-to-digital converter (TDC) and the sampling clock or BF clock generator. Fast transient response and static low power consumption are achieved simultaneously by the adaptive sampling capability of the BF quantizer. With the help of the proposed hysteresis switching logic (HSL) and replica frequency-locked loop (FLL), the built-in offset of the BF quantizer is eliminated. The TRO powered by the output of DLDO mimics half of the critical path delay of microprocessors and guarantees error-free operation even during voltage undershoot caused by load transients. In the load transient test of 50mA/μs with a 100-pF load capacitor, the proposed HSL improves the voltage undershoot and the steady-state offset by 25% and 84%, respectively. Fabricated in 65-nm LP process, the tested prototype holds an active area of 0.045mm^2 and achieves 0.76-ps FOM. |
ID | Time | Title / Authors / Affiliation |
15.1 (7145) (Highlight) |
09:00 | 09:25 |
A 2.47 μJ/sample QR-Decomposition-based Extreme
Learning Machine Engine Supporting Online Class
Incremental Learning for ECG-based User Identification Yi-Ta Chen, Li-Sheng Chang, Yu-Chuan Chuang, An-Yeu Wu National Taiwan University, Taiwan Abstract: To support online class incremental learning (O-CIL) in ECG-based user identification, this work presents a QR-decomposition-based extreme learning machine (QRD-ELM) engine. A diagonally-mapped linear array (DMLA) enables the support of online learning reducing 98.5% of area. The integrated PE design with unified COordinate Rotation DIgital Computer (u-CORDIC) further reduces 15.3% of the area and 22.4% of the power consumption. A model-algorithm-circuit co-design module to support class incremental learning with low energy and area overhead. The QRD-ELM engine fabricated in 40nm CMOS technology with 1.33×1.33 mm2 die area achieves 2.47 μJ/sample learning energy efficiency, which is 28.5× than the state-of-the-art. |
15.2 (7215) (Highlight) |
09:25 | 09:50 |
A 1.3mW Speech-to-Text Accelerator with Bidirectional
Light Gated Recurrent Units for Edge AI Yu-Hsuan Tsai*1, Yi-Cheng Lin*1, Wen-Ching Chen2, Liang-Yi Lin2, Nian-Shyang Chang2, Chun-Pin Lin2, Shi-Hao Chen3, Chi-Shi Chen2, and Chia-Hsiang Yang1 1National Taiwan University, Taiwan 2Taiwan Semiconductor Research Institute, Taiwan 3Digwise Technology Ltd., Taiwan *Equally-Credited Authors (ECAs) Abstract: This work presents an energy-efficient speech-to-text accelerator. The bidirectional light gated recurrent unit (BLiGRU)-based neural network is adopted to achieve a high accuracy. Network compression is utilized to reduce the network size and associated computational complexity by 29.8× and 73.2×, respectively. Efficient sequence decoding without backtracking is implemented to reduce the latency and memory usage. The chip performs speech-to-text conversion in 9.77 ms/frame with 1.3 mW at 1.25 MHz. Compared to the state-of-the-art designs, the chip achieves a 6.5-to-177× lower normalized energy with the lowest 15.2% phone error rate (PER) on the TIMIT dataset. |
15.3 (7166) |
09:50 | 10:15 |
A 6 Gbps PAM-3 Transceiver with Time-Varying Offset
Compensation Ju Eon Kim1,2, Dong-Hyun Yoon2, Junyoung Song3, Kwang-Hyun Baek4, Jung-Hwan Choi1, and Tony Tae-Hyoung Kim2 1Samsung Electronics, Korea 2Nanyang Technological University, Singapore 3Incheon National University, Korea 4Chung-Ang University, Korea Abstract: CMOS technology scaling improves performance by reducing supply voltage, parasitic capacitor, and physical area. Thus, device reliability issues, such as component mismatches and aging effects become prominent in the aggressively scaled technology. Especially, signal levels of PAM are highly susceptible to PVT variations and device mismatches. This paper proposes an offset compensation technique for a PAM-3 transceiver. The proposed compensation algorithm continuously detects faulty patterns and generates optimal reference voltage for the single-to-differential amplifier to cancel out time-varying offset. This work presents a 6Gbps PAM-3 transceiver in 65nm CMOS. The proposed technique improves the eye-opening by 38%. |
15.4 (7147) |
10:15 | 10:40 |
A 12.8-Gbps 0.5-pJ/b Encoding-less Inductive Coupling
Interface Using Clocked Hysteresis Comparator for 3Dstacked SRAM in 7-nm FinFET Kota Shiba1, Mitsuji Okada2, Atsutake Kosuge2, Mototsugu Hamada2, and Tadahiro Kuroda2 1The University of Tokyo, Japan 2Research Association for Advanced Systems, Japan Abstract: A 0.5-pJ/b 12.8-Gbps/link inductive coupling inter-chip wireless communication interface for a 3D-stacked SRAM has been developed in a 7-nm FinFET process. A new clocked hysteresis comparator that eliminates encoding for synchronous communication achieves 1.49 times higher data rate and 36% lower energy consumption compared to conventional synchronous communication using Manchester encoding. Inter-chip communication at 0.5-pJ/b 12.8-Gbps/link was confirmed using test chips. The proposed interface for a 4-hi 3D-stacked SRAM module achieves a 1.7-TB/s/mm2 IO area efficiency, representing a two-orders-of-magnitude improvement over a state-of-the-art interface for a 3D-stacked SRAM with competitive energy efficiency. |
ID | Time | Title / Authors / Affiliation |
16.1 (7111) |
09:00 | 09:25 |
A Compact Square-Geometry Quad-Core 19 GHz Class-F
VCO with Parallel Inductor-sharing Technique achieving
-137.2 dBc/Hz Phase Noise at 10MHz Offset Yaqian Sun1, Wei Deng1,2, Haikun Jia1, Zhihua Wang1,2, and Baoyong Chi1 1Tsinghua University, China 2Research Institute of Tsinghua University in Shenzhen, China Abstract: A square-geometry quad-core oscillator with inductor sharing technique is proposed in this paper and it exhibits a compact area of 0.09 mm2, which is the smallest quad-core VCO operating at a similar oscillation frequency. The unwanted mode is suppressed by the metal trace that connects the drain node of adjacent cores. The proposed VCO is fabricated in 65nm CMOS technology. The measured phase noise is -137.2 dB/Hz at 10 MHz offset frequency from a carrier of 19 GHz, which translates to the FoM of 186.1 dBc/Hz. |
16.2 (7064) |
09:25 | 09:50 |
A 17-21GHz Current-Folding Frequency Tripler With
>36dBc Harmonic Rejection in 90nm CMOS Chun-Hung Lin and Ching-Yuan Yang National Chung Hsing University, Taiwan Abstract: A frequency tripler (FT) using a current-folding technique to achieve inherently nonlinear operation is presented. A built-in VCO generates the fundamental signal, and the proposed current-folding stage converts the fundamental input into the triple-frequency output, which is injected into a bandpass stage for harmonic suppression. Fabricated in 90-nm CMOS technology, the measured FT features 36 to 43-dBc harmonic rejection from 17.5 to 21 GHz (18.2% FTR), while consuming 3.5 mW only from 1.2-V supply. The measured phase noise (PN) of the VCO and the FT are -112.5 and -102.8 dBc/Hz at 1-MHz offset, respectively. Furthermore, the achieved figure-of-merit (FoM) of the proposed FT are -180.52 and -190.87 dB at 1-MHz and 10-MHz offset, respectively. |
16.3 (7191) |
09:50 | 10:15 |
An 18.8-to-20.3-GHz Wide-Ramping-Range Cascaded-PLL-Based FMCW Generator with 44.1-kHz RMS
Frequency Error and -105.6-dBc/Hz Phase Noise in
40-nm CMOS Xiaofei Liao1,2, Feifan Hong1,2, Sijie Pan2, Xiaohu You1,2, and Dixian Zhao1,2 1Southeast University, China 2Purple Mountain Laboratories, China Abstract: A cascaded phase-locked loop (PLL) with wideband low-noise frequency modulation for frequency-modulated continuous-wave (FMCW) radar applications is presented. It utilizes a wideband millimeter-wave VCO with flat gain sensitivity to ensure wide chirp bandwidth and frequency modulation linearity. An in-depth analysis of the loop bandwidth optimization in cascaded PLL for the FMCW synthesizer is detailed. Fabricated in 40-nm CMOS, the proposed cascaded PLL can produce 1.5-GHz triangular and sawtooth chirp from 18.8 to 20.3 GHz, achieving a minimum root-mean-square (rms) frequency error of 44.1 kHz. The measured PN at 1-MHz offset from 19.2 GHz is -105.6 dBc/Hz. |
16.4 (7089) |
10:15 | 10:40 |
A 140GHz 4TX-4RX Phased-Array FMCW-FSK AntennaPackaged Radar Chipset With 25dBm EIRP and 16GHz BW Shunli Ma1, Tianxiang Wu1, Zhuofan Xu1, Zhonghao Sun1, Xuefeng Li1, Lei Wu1, Biao Hu1, Junyan Ren1, Yong Chen2, and Jiebin Pan3 1Fudan University, China 2University of Macau, China 3East China Institute of Photo-Electron IC, China Abstract: Frequency modulated continuous wave (FMCW) radar sensors are widely utilized for security checks, car-collision avoidances, vital signs of people, and tiny movements [1]-[5]. The 4D mm-wave radar needs large phased-array elements to realize accurate detecting. Range resolution is determined by the bandwidth (BW) of the transceiver (TRX). Moreover, it is better to design sensing and communication functions into the system simultaneously. This paper presents a 140GHz phased-array FMCW chipsets in 65nm bulk CMOS supporting a 16GHz BW with a custom horn antenna package. Based on the tile structures of the TRX, our system can be scaled up to a large size array for 4D phased-array radar. |
ID | Time | Title / Authors / Affiliation |
17.1 (7236) (Highlight) |
09:00 | 09:25 |
A 14V Hybrid Boost Converter With Scalable Conversion
Ratio in 180nm Standard CMOS for an Ultrasound
Imaging System Jiaqi Guo1, Jiamin Li2, Jerald Yoo1,3 1National University of Singapore, Singapore 2Southern University of Science and Technology, China 3The N.1 Institute for Health, Singapore Abstract: To provide the high voltage supply (>10V) and intermediate voltage domains required by the transducer driving circuits for ultrasound imaging, and to achieve that in the standard CMOS process for easy processor and IP integration, this works presents a 14V multiple-output boost converter with hybrid structure and PWM mode operation. The chip implemented in 180nm standard CMOS process regulates 3.5V, 7V, 10.5V and 14V from a 1.5V input, while keeping the switch stress (VGS, VDS) of all transistors below 3.5V at any switching state. It achieves a simulated efficiency of 78%, doubling the 35% achieved in earlier works. |
17.2 (7091) |
09:25 | 09:50 |
A 0.24 mmHg (1σ) Resolution Half-Bridge-to-Digital
Converter with RC Delay-Based Pressure Sensing and
Energy-Efficient Bit-Level Oversampling Techniques for
Implantable Miniature Systems Donguk Seo1, Minsik Cho1, Minhyeok Jeong1, Gicheol Shin1, Inhee Lee2, and Yoonmyung Lee1 1Sungkyunkwan University, Korea 2University of Pittsburgh, USA Abstract: A pressure sensor with a half-Wheatstone-bridge-to-digital converter is proposed for implantable miniature systems. The half-Wheatstone-bridge sensor uses an RC delay comparison, which self-limits current for energy-efficient operation. To overcome the limited sensitivity of the HB, bit-level oversampling is introduced and 0.24 mmHg (1σ) resolution with an 8.58 nJ∙mmHg2 FOM is achieved, which is significantly better than that of the prior-art HB-based pressure sensor and comparable to the Wheatstone-bridge-based pressure sensors. |
17.3 (7040) |
09:50 | 10:15 |
A 0.0308mm2 4.15pJ/conv VCO-Based Current Sensing Front-End with 2nd-Order Δ2-ΔΣ Modulation Jee-Ho Park, Ji-Hyoung Cha, Yongjae Park, and Seong-Jin Kim Ulsan National Institute of Science and Technology, Korea Abstract: This paper presents a 2nd-order Δ2-ΔΣ modulator based on a VCOQ with a PWM I-DAC for the precise acquisition of incoming current in an area- and energy-efficient form factor. The proposed Δ2-modulation substantially attenuates the magnitude of input signals, enhancing the linearity and DR. Moreover, an additional differentiator followed by the VCOQ features the negative feedback loop in the 2nd-order ΔΣ modulator, increasing noise shaping order with no DAC noise. In addition, the PWM I-DAC substituting the multi-bit I-DAC is devised to mitigate noise further, realizing the high resolution of 1 pA with 500-Hz bandwidth. The prototype chip fabricated in a 110-nm CMOS occupies 0.0308mm2 and achieves the Walden FoM of 4.15 pJ/conv. |
17.4 (7193) |
10:15 | 10:40 |
A 57.2GHz 11.2mW 8-bit General Purpose
Superconductor Microprocessor with Dual-Clocking
Scheme Ikki Nagaoka1, Ryota Kashima1, Tomoki Nakano1, Masamitsu Tanaka1, Taro Yamashita2, Koji Inoue3, and Akira Fujimaki1 1Nagoya University, Japan 2Tohoku University, Japan 3Kyushu University, Japan Abstract: A superconductor single-flux-quantum (SFQ) logic 8-bit microprocessor is demonstrated up to 57.2 GHz with a measured power consumption of 11.2 mW. The microprocessor has an ultradeep, gate-level pipelining containing many feedback paths and communications between components. The arrival clock timings at all the logic gates are ultra-precisely tuned using two different clocking schemes, called “concurrent-flow” and “counter-flow,” to achieve extremely high clock frequency operation over 50 GHz. Low-temperature circumstances enable us to conduct super delay-intensive layout design by controlling delays of all waveguide interconnects in the order of sub-picosecond precision. |
ID | Time | Title / Authors / Affiliation |
18.1 (7080) (Highlight) |
14:00 | 14:25 |
A 0.56V/0.8V Vision Sensor with Temporal Contrast Pixel
and Column-Parallel Local Binary Pattern Extraction for
Dynamic Depth Sensing Using Stereo Vision Min-Yang Chiu, Guan-Cheng Chen, Yu-Hsiang Huang, Tzu-Hsiang Hsu, Chung-Chuan Lo, Ren-Shuo Liu, Meng-Fan Chang, Kea-Tiong Tang, Chih-Cheng Hsieh National Tsing Hua University, Taiwan Abstract: A 0.56V/0.8V 126x126 vision sensor with 6T1C temporal contrast pixel, exposure compensation scheme, column-parallel local-binary-pattern (LBP) and region-of-interest (ROI) extractions is prototyped and verified. For motion detection and position tracking, it supports 10b raw image, 10-bit frame difference, and 1.5-bit event reporting (ER) output. For dynamic depth sensing of moving objects using stereo vision system, it supports 8-bit LBP feature map and ROI for efficient disparity calculation. |
18.2 (7150) |
14:25 | 14:50 |
A 118.6fJ/Conversion-Step Two-Step Time-Domain RCto-Digital Converter With 33nF/10MΩ Range and 53aFrms
Resolution Hoyong Seong1, Chongsoo Jung1, Donghyun Youn1,Junghyup Lee2, Sohmyung Ha3, and Minkyu Je1 1KAIST, Korea 2DGIST, Korea 3New York University Abu Dhabi, United Arab Emirates Abstract: This paper presents a 2-step time-domain (TD) RC-to-digital converter (RCDC). To overcome the fundamental tradeoff between resolution and energy efficiency that constrains TD converter designs, a 2-step TD conversion method is proposed. Utilizing a slow reference oscillator (R-OSC) for coarse conversion and a fast duty-cycled gear-up oscillator (G-OSC) for fine conversion, the time period of the sensor oscillator output after frequency division can be measured with both high resolution and high energy efficiency. A duty-cycled phase-locked loop (PLL) is employed to consistently maintain the required relationship between the R-OSC and G-OSC outputs without any calibration. Fabricated in a 180nm CMOS, the proposed 2-step TD RCDC IC achieves 53aFrms resolution and 33nF/10MΩ input range, consuming 6.75μW. |
18.3 (7224) (Highlight) |
14:50 | 15:15 |
A −50 to 130 °C, 38.69 pJ/conv Fully Integrated SAR
Temperature Sensor Based on Direct Temperature-Voltage Comparison Jooeun Kim, Jeongmyeong Kim, Changjoo Park, Minkyu Yang, and Wanyeong Jung KAIST, South Korea Abstract: This paper presents a SAR temperature sensor using a clocked temperature-voltage comparator. The clocked comparator has an input offset which is linearly proportional to the temperature, and the SAR detects the offset voltage to measure the temperature. Temperature transduction is spatially and temporally confined in the comparator’s dynamic comparison, so it is robust against various circumstances. The SAR-based overall structure allows simple design and operation, without complex digital filtering nor post-processing, and low energy consumption. The test chip fabricated in 0.18μm CMOS process shows 3-sigma error of −2.54/+2.16°C over a wide range of −50 to +130°C, with 38.69pJ/conv energy consumption. |
18.4 (7020) |
15:15 | 15:27 |
A Digital Temperature Sensor Based on 10b SAR ADC for
Non-linear Temperature Dependency Compensation in
3D NAND Flash Memory Kyoung-Jun Roh, Min-Ki Jeon, Jaewoo Park, Myoungbo Kwak, Chi-Weon Yoon, Youngdon Choi and Jung-Hwan Choi Device Solutions, Samsung Electronics, Korea Abstract: In this paper, we propose a digital temperature sensor (DTS) to compensate a nonlinearity of VT shift with temperature in VNAND flash memory. The DTS consists of a voltage generator that generates a CTAT voltage from a bandgap reference voltage and a 10-bit SAR type ADC. And, the DTS is designed to work in synchronization with a NAND command signal. The proposed circuit is implemented with multi-stacked VNAND technology of Samsung Electronics. The conversion time takes a total of 4 μs including the voltage generator setup time. And, the resolution of 40 samples is 0.753 °C/LSB, and the maximum deviation with 1-point calibration for each NAND operation is 12 LSB. |
18.5 (7102) |
15:27 | 15:40 |
A sub-nW scalable nMOS voltage reference with multiloop regulation achieving 0.0126%/V line sensitivity Chutham Sawigun, Xiaolin Yang, Andrea Lodi, and Carolina Mora Lopez imec, Belgium Abstract: In order to achieve a better LS than other existing techniques, we propose in this paper a regulated voltage reference that allows multiple regulation loops for LS improvement, and offers output voltage scalability in a single-branch topology. The proposed VR uses only nMOS devices, occupies the smallest area and achieves the lowest LS compared with other state-of-the-art regulated VRs. |
ID | Time | Title / Authors / Affiliation |
19.1 (7217) |
14:00 | 14:25 |
A Real-Time High-Resolution Variable-Size Imaging
Processor for Spaceborne Synthetic Aperture Radar Jia-Zhao Lin1, Po-Ta Chen1, Hung-Yuan Chin1, Pei-Yun Tsai1, and Sz-Yuan Lee2 1National Central University, Taiwan 2National Applied Research Laboratory, Taiwan Abstract: We present a real-time imaging processor for spaceborne high-resolution synthetic aperture radar. To achieve the goal, DRAM burst access pattern is developed given azimuth FFT/IFFT decomposition with bit-reversed frequency-domain data to achieve streaming input /output in the processing kernel. Hybrid datapaths that use 17-bit customized floating point (CFP) FFT/IFFT operations and 64-bit double precision arithmetic units for phase calculation are designed to meet the precision requirement. Multi-segment high-order Taylor series expansion is adopted to approximate the complicated migration factors to support configurability. Our implementation shows at least 2.93X improvement in normalized processing time and has excellent precision. |
19.2 (7239) |
14:25 | 14:50 |
A 409.6 GOPS and 204.8 GFLOPS Mixed-Precision Vector
Processor System for General-Purpose Machine Learning
Acceleration Jung-Hoon Kim, Sukjin Lee, Seungjae Moon, Sungyeob Yoo, and Joo-Young Kim KAIST, Korea Abstract: This paper presents a mixed-precision vector processor named MVP and its multi-core system for general-purpose ML acceleration. It has three key contributions: 1) MVP supports fixed and floating-point data types and various AI operations with scalable vector lanes, 2) MVP has a two-level instruction set architecture (ISA), and its microcode generator enables handy ML model mapping and small code size, and 3) the software stack efficiently allocates a target ML model into multiple MVPs, generating all the necessary runtime binaries. As a result, the proposed multi-MVP system provides a peak performance of 409.6 GOPS and 204.8 GFLOPS and energy efficiency of 13.97 GOPS/W and 6.99 GFLOPS/W on a Xilinx Alveo U50 FPGA card, achieving 83.84% average effective utilization when it runs various ML models. |
19.3 (7248) |
14:50 | 15:15 |
An Efficient Unsupervised Learning-based Monocular
Depth Estimation Processor with Partial-Switchable
Systolic Array Architecture in Edge Devices Wonhoon Park, Dongseok Im, Hankyul Kwon, and Hoi-Jun Yoo Korea Advanced Institute of Science and Technology, Korea Abstract: In this paper, the unsupervised learning-based MDE processor is proposed with the following key features: 1) the multi-path simultaneous processing (MPSP) to reduce the external memory access of the multi-path sampling block by 16.8%, 2) partial-switchable systolic array (PSSA) architecture to maintain the high utilization of the processing elements achieving average 51.5% of throughput enhancement, and 3) dynamic network selection learning (DNSL) system to optimize the pose network during the training increasing the system energy efficiency by 59% for getting supervision |
19.4 (7235) |
15:15 | 15:40 |
F-LIC: FPGA-based Learned Image Compression with a
Fine-grained Pipeline Heming Sun1,2,3, Qingyang Yi4, Fangzheng Lin1, Lu Yu2, Jiro Katto1, and Masahiro Fujita4,5 1Waseda University, Japan 2Zhejiang University, China 3JST, PRESTO, Saitama, Japan 4The University of Tokyo, Japan 5AIST, Japan Abstract: This paper gives an FPGA design for learned image compression (LIC). By proposing a fine-grained pipelining schedule, higher DSP efficiency can be obtained. Besides, we also propose the cascading DSP schemes and zero-skipping deconvolution scheme. Compared with latest FPGA-based LIC, we can reach faster speed with higher power efficiency. |
ID | Time | Title / Authors / Affiliation |
20.1 (7104) (Highlight) |
14:00 | 14:25 |
A 0.95pJ/b 5.12Gb/s/pin Charge-Recycling IOs with 47%
Energy Reduction for Big Data Applications Han Wu1, Jeong Hoan Park2, Miaolin Zhang1, Longyang Lin3, Rucheng Jiang1, Jung-Hwan Choi2, Jerald Yoo1,4 1National University of Singapore, Singapore 2Samsung Electronics, South Korea 3Southern University of Science and Technology, China 4The N.1 Institute for Health, Singapore Abstract: We propose the Charge-Recycling IOs (CRIOs) save energy up to 32.2% for the TSV link (2.56Gb/s) and 47% for the T-Line link (5.12Gb/s), when compared with conventional IOs. Implemented in 40nm 1P8M standard CMOS, the signal integrity and the BER performance of the proposed CRIOs is comparable to the conventional IOs. |
20.2 (7045) |
14:25 | 14:50 |
A 10Gb/s/pin DQS and WCK Built-Out Tester for LPDDR5
DRAM Test Chan-Ho Kye1, Jihee Kim2, Kyungmin Baek2, Kahyun Kim2, Sangjin Pack3, Changwon Jung3, and Deog-Kyoon Jeong2 1EPFL, Switzerland 2Seoul National University, Korea 3SK Hynix, Korea Abstract: We propose a data strobe (DQS) and write clock (WCK) tester that can replace DFT for the high-speed test of LPDDR5 DRAM. |
20.3 (7009) |
14:50 | 15:15 |
A 7.5Gb/s/pin 12Gb-LPDDR5x SDRAM with a Pseudodouble-bit ECC and “Spider”-shape Datapath Control Architecture in a 2nd Generation 10nm DRAM Process Feng Lin, Kangling Ji, Enpeng Gao, Zhonglai Liu, Weibing Shang, Hongwen Li Changxin Memory Technologies, Inc., China Abstract: A 12Gb LPDDR5x SDRAM is presented with unique pseudo-double-bit ECC functions. A “Spider”-shape eight-way multiplex is served as central traffic control of high-speed datapaths. A direct dynamic voltage and frequency scaling is proposed to cut down boundary crossing power consumption by 57%. Data receivers with 1-tap DFE is proposed with an on-die eye monitor for margin evaluation. The chip is manufactured using a 2nd generation 10nm DRAM process and achieved 7.5Gb/s/pin data rate under 1.05V. |
20.4 (7230) |
15:15 | 15:40 |
A Single-Ended Duobinary-PAM4(PAM7) Transmitter with
a 2-Tap Feed-Forward Equalizer Jaenam Kim1, 2*, Sanghyeon Park1, 2*, Jaewoo Park1, Junhan Bae1, and Jung-Hoon Chun1, 3 1Sungkyunkwan University, South Korea 2Samsung Electronics, South Korea 3SolidVue, South Korea *Equally Credited Authors (ECAs) Abstract: A PAM4/duobinary-PAM4 dual-mode transmitter is demonstrated in a 28 nm CMOS technology. The duobinary-PAM4 encoder adds two half-rate PAM4 signals driven by quarter-rate clocks and produces 7-level duobinary-PAM4 signals. The proposed transmitter with a 2-tap feed-forward equalizer consists of 48 source-series terminated (SST) driver segments that are partitioned into six blocks to generate a duobinary-PAM4 signal. At 18 Gb/s, the proposed transmitter achieves 1.11-pJ/b and 1.66-pJ/b energy efficiency in duobinary-PAM4 and PAM4 modes, respectively. |
ID | Time | Title / Authors / Affiliation |
21.1 (7112) (Highlight) |
14:00 | 14:25 |
A 91-dB DR 20-kHz BW 5th-Order Multi-Step Incremental
ADC for Sensor Interfaces by Re-Using a MASH 2-1
Modulator Jia-Sheng Huang1,2, Shih-Che Kuo1, Yu-Cheng Huang1, Chia-WeiKao1,2, Che-Wei Hsu1,3 and Chia-Hung Chen1 1National Yang Ming Chiao Tung University, Taiwan 2Now with Realtek, Taiwan 3Now with Mediatek, Taiwan Abstract: A 3rd-order multi-stage incremental ΔΣ ADC (IADC) is proposed to operate in two steps by re-using the same hardware. The first-step is a third-order cascaded IADC for oversampling ratio OSR=24, and then the circuit is reconfigured as a second-order IADC for another OSR=16 for the fine-quantization. The noise-shaping performance is boosted from third- to fifth-order. Prototyped in 0.18 μm technology, the measured DR/SNDR are 91/89 dB and it achieves Schreier FoMs 168.5/166.6 dB for 10 kHz BW. |
21.2 (7165) |
14:25 | 14:50 |
A 78.6 dB-SNDR 520mVpp-full-scale 620MΩ-Zin 105dBCMRR VCO-based Sensor Readout Circuit Using FVFBased Gm-Input Structure Yi Zhong, Lu Jie, and Nan Sun. Tsinghua University, China Abstract: This paper presents a flipped-voltage-follower (FVF)-based Gm-input CT-ΔΣ ADC with an input impedance enhancement technique. The prototype ADC achieves 78.6dB SNDR with 10 kHz BW at the input range of 480mVpp while consuming 7.1μW, resulting in the Schreier FoM (FoMs) of 170.1dB. This work also achieves 620MΩ input impedance at the chopping frequency of 45kHz and 105dB CMRR. |
21.3 (7043) |
14:50 | 15:15 |
110.1dB DR 4-ch Audio ADCs and 98dB DR 2-ch VoiceTriggering ADCs in Reconfigurable Architecture with
Enhanced Off-Transistor-Based Bias Noise Filter Moo-Yeol Choi, Inhwan Cho, Myungjin Lee, Seunghyun Oh, Jongwoo Lee Samsung Electronics, Korea Abstract: 4-ch audio ADCs and 2-ch voice-trigger system ADC with an enhanced off-transistor-based bias noise filter are proposed. The proposed technique addresses the limitations of a voltage drift by well-diode leakage and a reduced equivalent resistance in the previous work of off-transistor-based noise filter. The measured results of audio ADC show 110.1dB DR and -100.1dB THD+N. The CT-DSM in this work achieves the Schreier FoM of 185.7dB in audio ADC mode and 170.6dB in VTS ADC mode and attains the highest DR despite of the additional noise of a capacitive-coupled gain amplifier. |
21.4 (7219) |
15:15 | 15:27 |
A 103.8-dB DR 25ps-to-35ns Resolution Time-to-Digital
Converter with Dynamic Ring Oscillator for LiDAR
Applications Taewoong Kim1,2, Sanghoon Lee1, and Youngcheol Chae1 1Yonsei University, Korea 2Now in Samsung Electronics, Korea Abstract: This paper proposes a wide dynamic range TDC for LiDAR sensors, the architecture of which is basically a ring oscillator (RO)-based folding TDC and can have different resolutions proportional to the input range by using a dynamically pre-charged supply voltage on a reservoir capacitor. This dynamic RO changes its time resolution from 25 ps to 35 ns. This in turn leads to a significant increase in the dynamic range, resulting in a maximum measurable time of 3.9 μs, which means a distance of 585 m. Implemented in a small area of 0.0135 mm2 with a 28 nm FDSOI process, the prototype TDC achieves a wide dynamic range of 103.8 dB while consuming only 45.6 μW. |
21.5 (7209) |
15:27 | 15:40 |
A 0.3V 762nW-Only Binary-Search Phase ADC With
Current-Reused RO-based Comparator Sifan Wang1, Kejin Li1, Chi-Hang Chan1, Yan Zhu1, Rui Paulo Martins1,2 1University of Macau, China 2On leave Universidade de Lisboa, Portugal Abstract: This paper presents a 0.3V 4b binary-search-based phase ADC, running at 1MS/s while only consuming 762nW. Unlike existing techniques with large peripheral circuits and power overhead, the proposed phase ADC keeps simple and consumes purely dynamic power. The linear combiner cascode with the ring-oscillator-based (RO-based) comparator allows current-reused at ultralow voltage. Further incorporated with the proposed binary-search logic for the phase quantization, it realizes an outstanding energy efficiency by reducing the number of comparisons to four in this 4b phase ADC. The phase ADC\'s timing loop is asynchronous, thus maintaining a 1MHz sampling rate under such low voltage. |