I am currently pursuing an MSc in Digital Systems Design at the School of Electronics and Computer Science (ECS), University of Southampton. My research focuses on AI accelerator architecture and RTL design, driven by cross-layer optimization from algorithms and systems.

Since Winter 2024, I have been collaborating remotely on AI chip research and tape-out with Dr. Changchun Zhou.

I received my BEng in Microelectronics Science and Engineering from Harbin Institute of Technology in June 2025.

My master project focuses on the design and implementation of neural networks on physical neural networks (PNN), under the supervision of Dr. Sajjad Taravati.

My research spans full-stack optimization from algorithm to hardware for AI chip, including:

  • Algorithm-to-hardware design and implementation of AI chips
  • Hardware-software co-design for intelligent computing accelerators
  • Computer architecture and novel accelerator architectures

Further details can be found in my CV.

I am currently seeking RA opportunities for Fall 2026 and PhD opportunities for Fall 2027.


๐Ÿ“˜

Education

University of Southampton, UK
Sep 2025-Sep 2026 (expected)

MSc Digital Systems Design

School of Electronics and Computer Science (ECS)

Harbin Institute of Technology, China
Aug 2021-Jun 2025

BEng Microelectronics Science and Engineering

School of Astronautics


๐Ÿ“„

Publications

ISCAS paper
ISCAS 2026

A Unified Function Processor with Integer Arithmetic Based on Piecewise Chebyshev Polynomial Approximation

X. Zheng*, Z. Guo*, and C. Zhou#. (#Corresponding Author *Co-first author)

IEEE International Symposium on Circuits and Systems (ISCAS) (Accepted)

Abstract

Nonlinear functions are fundamental components in widely applied AI algorithms. However, their hardware implementation presents a major challenge due to the diversity of function types and the full precision requirement (e.g., FP32), which results in significant area and energy overheads. To overcome these issues, we present a Unified Function Processor (UFP) with integer arithmetic, capable of efficiently computing a wide range of nonlinear functions with high accuracy under integer constraints. First, we propose a dynamic programming segmentation algorithm within a third-degree Chebyshev polynomial framework that optimally partitions each function into eight integer-aligned segments to minimize global quantization error. Second, a unified three-stage pipelined hardware with computation element reuse is proposed. Implemented in TSMC 28-nm HPC technology and working at 1GHz, the proposed UFP achieves up to 93.6% reductions in area compared to the state-of-the-art works, with a 79% lower energy consumption. The architecture flexibly supports all mainstream functions in AI algorithms with configurable precision and range, offering a compact and scalable solution for AI acceleration hardware.

TCAS-I paper (preparation)
TCAS-I (preparation)

Variable Bit-Width Unified Function Processor for AI Acceleration

C. Zhou#, X. Zheng*, Z. Guo, etc.

IEEE Transactions on Circuits and Systems I (TCAS-I) (in preparation)

TCAS-II paper (preparation)
TCAS-II (preparation)

A Fast FPS Parallel Computing Method, Adaptive to Point Cloud Distribution

X. Zheng*, K. Yu*, and C. Zhou#.

IEEE Transactions on Circuits and Systems II (TCAS-II) (in preparation)


๐Ÿš€

Projects

Volans LLM Chip
NPU Subsystem Microarchitecture Co-designer and Developer    |    Supervised by Dr. Changchun Zhou
Feb 2026-Present
  • Based on the limited arithmetic intensity of GEMV, co-designed the microarchitecture of MMU part of the proposed new datapath for GEMM integrated into a RISC-V processor within an SoC;
  • Partition the MMU subsystem into an 8-stage AXI-Stream-based pipeline and implement multiple modules at the RTL level.
Volans LLM Chip architecture
Fornax Diffusion Chip
OCI Module Developer, UFP Module Co-developer    |    Supervised by Dr. Changchun Zhou
8/2025-12/2025
  • Proposed the Quantization-Aware Polynomial Approximation Algorithm (QADP) based on Chebyshev. Built a configurable Python toolchain for design-space exploration and error-resource trade-off analysis.
  • Designed the OCI (output reversely compresses input) module, including module-level instruction registers configuration, micro architecture design, dataflow, and RTL implementation, covering AXI transactions, address generation, data reorganization and scheduling. Completed module-level simulation and functional verification, and finally integrated it to SoC.
show show
Voxel-based Point Cloud Processing for Hardware Acceleration
Team Leader    |    Supervised by Dr. Changchun Zhou
09/2024-Present
  • Designed a point cloud sampling and neighbor-search algorithm based on voxel partitioning, which reduced explicit distance computations by leveraging voxel index mapping.
Intelligent Control and Application of Landscape Lighting in Complex Environments
Team Member of Controller Part
10/2024-06/2025
  • Developed a dynamic audio-responsive lighting control system based on feature extraction and adaptive control algorithms base on Modbus protocol, ADC acquisition, and RS485 communication on STM32 platform. The work has been deployed in the 2025 Harbin Ice and Snow World.
show PCB

๐Ÿ› ๏ธ

Skills

  • Programming: Verilog, SystemVerilog, C, Python
  • Digital ASIC Implementation: Basic Full RTL-to-GDSII flow; Synopsys DC/DFT, Cadence Xcelium/Innovus/Virtuoso; AMS 0.35ยตm/TSMC 180nm PDK; SDC constraints; post-layout simulation
  • FPGA & Simulation: Quartus, ModelSim, MATLAB
  • PCB / Embedded: Altium Designer
  • Language: English(IELTS 6.5), Chinese(Native)