Work Experience
Awards and Honors
- Winner at Robert S. Hilbert Memorial Optical Design Competition, Jul 2022
- Best Paper Award, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), Oct 2021
- DAC Young Fellow, 58th IEEE/ACM Design Automation Conference (DAC), Oct 2021
- Cockrell School Graduate Student Fellowship, UT Austin, Jun 2021
- First Place, ACM Student Research Competition Grand Finals, May 2021
- Best Poster Award, NSF Workshop on Machine Learning Hardware, Dec 2020
- First Place, ACM/SIGDA Student Research Competition, Nov 2020
- 7th Place, IWLS Programming Contest: Machine Learning + Logic Synthesis, Aug 2020
- DAC Young Fellow, 57th IEEE/ACM Design Automation Conference (DAC), Jul 2020
- Best Paper Award Finalist (1 out of 6), 57th IEEE/ACM Design Automation Conference (DAC), Jul 2020
- Best Paper Award, 25th ACM/IEEE Asian and South Pacific Design Automation Conference (ASP-DAC), Jan 2020
- 4th Place, 2019 DAC System Design Contest on Low Power Object Detection, May 2019
- First Prize Scholarship, Fudan University, 2017 - 2018
- Top 5, 2018 FUTURELAB AI Contest (CV Group), 2018
- Top 11%, 2017 IEEEXtreme Global Programming Competition (out of 3,350 teams worldwide), 2017
Work Experience
- NVIDIA, USA
- Research Summer Intern, VLSI team, May 2022 - Aug 2022, Austin, TX
- Efficient neural network compression and hardware acceleration.
- Meta Platforms, USA
- Research Summer Intern, Meta reality labs, FAST AI team, May 2021 - Dec 2021, Austin, TX
- Hardware-aware model design for efficient on-device vision inference.
Research Experience
- University of Texas at Austin, USA
- Research Assistant, ECE department, University of Texas at Austin, Jan 2019 - Present, Austin, TX
- Efficient Neural Arch. Design, NN Structured Pruning: Designed hardware-efficient frequency-domain photonic neural network architecture; achieved 3-4× area reduction by using block-circulant matrices and structured pruning compared with previous ONN architectures; further improved the area and power by 2x and 10x by joint optimization and fine-grained structured pruning
- NN Quantization and Robustness: Developed differentiable quantization-aware training scheme in the unitary manifold to enable robust optical neural networks with low-precision voltage controls; achieved better accuracy and robustness with limited control resolution and device-level variations
- Photonics CNN and RNN Design: Collaborated on designing high-throughput and low-power photonic CNN architectures; helped design and fabricate high-speed silicon-photonic RNN
- NN On-Chip/On-Device Learning: Developed customized CUDA extension for ONN simulation acceleration; proposed efficienton-chip learning algorithm for optical neural networks with stochastic zeroth-order optimization; achieved 3-4× higher learning efficiency, 10× better scalability, and better robustness than previous methods
- NN Efficient Training Framework: Collaborated on developing efficient training framework for reversible neural architectures via constrained optimization; our dynamic programming based scheduling achieves 5-20% speedup with comparable memory efficiency when training reversible NNs
- Photonics Neural Chip Tape-out: Worked on photonic neural chip tape-out for novel ONN architectures using Advanced Micro Foundry (AMF); collaborated on the full-stack schematic design, layout, validation, tape-out, and measurement of photonic neural chips using PyTorch, Lumerical toolkits, and Synopsys optodesigner
- GPU-Accelerated VLSI Detailed Placement: Collaborated on developing GPU-accelerated concurrent VLSI detailed placement with CUDA; implemented and optimized global swap and parallel auction algorithm for batch-based independent-set-matching; achieved >10x speedup without quality degradation
- GPU-Accelerated VLSI Global Placement: Collaborated on high-performance VLSI analytical global placement acceleration with CUDA on GPUs; optimized wirelength and density computation operators with CUDA; developed parallel congestion map estimation for routability optimization; achieved 40x speedup in global placement
- VLSI Global Placement Algorithm: Developed multi-electrostatics-based robust VLSI placement framework DREAMPlace 3.0 with PyTorch/C++/CUDA; proposed multi-electrostatic system for optimization under fence regionconstraints; developed divergence-aware optimizer for robust nonlinear global placement; achieved >13% HPWL improvement and >11% top5 overflow reduction compared with ISPD2015 contest winners
- Efficient NN Learning and Power Optimization: Proposed efficient ONN on-chip learning framework with two-level sparse optimizer and efficient power-aware optimization; achieved high convergence stability, ~10× training efficiency improvement, and ~10× power reduction than prior methods
- University of Texas at Austin, USA
- Research Assistant, ECE department, University of Texas at Austin, Sep 2018 - Jan 2019, Austin, TX
- FPGA Emulation of RISC-V Core: Projected RISC-V Rocket Core on Zynq FPGA and achieved communication between them
- Fault Injection: Customized FIRRTL transformation and built infrastructure for fault injection and system state snapshot
- Fudan University, Shanghai, China
- Research Assistant, EE department, Fudan University, Aug 2017 - July 2018, Shanghai, China
- Medical Imaging Dataset: Modified infant brain atlas offered by UNC and created complete tissue probability maps
- MRI Reconstruction: Developed two-stage reconstruction framework for infant thin-section MR image reconstruction by using GANs and CNNs; research is developing brand new method to improve reconstruction performance by fusing multi-planar MR images, and improving PSNR, SSIM, and NMI by 26.2%, 93.4%, and 25.3% respectively compared to bicubic interpolation
- Ultra-sonic Image Processing: Collaborated on super-resolution reconstruction of ultra-sonic imaging using U-Net and GANs; improved the full width at half maximum (FWHM) of point targets by 3.23%
Talk and Presentation
- Invited Talk, SPIE Photonics West, San Francisco, Feb, 1 2023
- Light-AI Interaction: The Convergence of Photonic AI and Cross-layer Circuit-Architecture-Algorithm Co-design
- LSIP Tech Talk, Hewlett Packard Labs, Dec, 16 2022
- Light-AI Interaction: Bridging Photonics and Artificial Intelligence via Cross-Layer Circuit-Architecture- Algorithm Co-Design
- ACCESS and CEDA Joint Seminar, Hong Kong, Jul 29 2022
- Light-AI Interaction: The Convergence of Photonic Deep Learning and Cross-Layer Design Automation
- Invited Talk, Nvidia AI Research, Oct 12, 2022
- NeurOLight: A Physics-Agnostic Neural Operator Enabling Parametric Photonic Device Simulation
- Invited Talk, Cornell Univ., Jan 19, 2022
- L2ight: Enabling On-Chip Learning for Optical Neural Networks via Efficient in-situ Subspace Optimization