Work Experience

Awards and Honors

MLSys Student Travel Award, May 2023
Margarida Jacome Dissertation Prize, Apr 2023
Winner at Robert S. Hilbert Memorial Optical Design Competition, Jul 2022
Best Paper Award, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), Oct 2021
DAC Young Fellow, 58th IEEE/ACM Design Automation Conference (DAC), Oct 2021
Cockrell School Graduate Student Fellowship, UT Austin, Jun 2021
First Place, ACM Student Research Competition Grand Finals, May 2021
Best Poster Award, NSF Workshop on Machine Learning Hardware, Dec 2020
First Place, ACM/SIGDA Student Research Competition, Nov 2020
7th Place, IWLS Programming Contest: Machine Learning + Logic Synthesis, Aug 2020
DAC Young Fellow, 57th IEEE/ACM Design Automation Conference (DAC), Jul 2020
Best Paper Award Finalist (1 out of 6), 57th IEEE/ACM Design Automation Conference (DAC), Jul 2020
Best Paper Award, 25th ACM/IEEE Asian and South Pacific Design Automation Conference (ASP-DAC), Jan 2020
4th Place, 2019 DAC System Design Contest on Low Power Object Detection, May 2019
First Prize Scholarship, Fudan University, 2017 - 2018
Top 5, 2018 FUTURELAB AI Contest (CV Group), 2018
Top 11%, 2017 IEEEXtreme Global Programming Competition (out of 3,350 teams worldwide), 2017

Work Experience

NVIDIA, USA
- Research Summer Intern, VLSI team, May 2022 - Aug 2022, Austin, TX
- Efficient neural network compression and hardware acceleration.
Meta Platforms, USA
- Research Summer Intern, Meta reality labs, FAST AI team, May 2021 - Dec 2021, Austin, TX
- Hardware-aware model design for efficient on-device vision inference.

Research Experience

University of Texas at Austin, USA
- Research Assistant, ECE department, University of Texas at Austin, Jan 2019 - Present, Austin, TX
- Efficient Neural Arch. Design, NN Structured Pruning: Designed hardware-efficient frequency-domain photonic neural network architecture; achieved 3-4× area reduction by using block-circulant matrices and structured pruning compared with previous ONN architectures; further improved the area and power by 2x and 10x by joint optimization and fine-grained structured pruning
- NN Quantization and Robustness: Developed differentiable quantization-aware training scheme in the unitary manifold to enable robust optical neural networks with low-precision voltage controls; achieved better accuracy and robustness with limited control resolution and device-level variations
- Photonics CNN and RNN Design: Collaborated on designing high-throughput and low-power photonic CNN architectures; helped design and fabricate high-speed silicon-photonic RNN
- NN On-Chip/On-Device Learning: Developed customized CUDA extension for ONN simulation acceleration; proposed efficienton-chip learning algorithm for optical neural networks with stochastic zeroth-order optimization; achieved 3-4× higher learning efficiency, 10× better scalability, and better robustness than previous methods
- NN Efficient Training Framework: Collaborated on developing efficient training framework for reversible neural architectures via constrained optimization; our dynamic programming based scheduling achieves 5-20% speedup with comparable memory efficiency when training reversible NNs
- Photonics Neural Chip Tape-out: Worked on photonic neural chip tape-out for novel ONN architectures using Advanced Micro Foundry (AMF); collaborated on the full-stack schematic design, layout, validation, tape-out, and measurement of photonic neural chips using PyTorch, Lumerical toolkits, and Synopsys optodesigner
- GPU-Accelerated VLSI Detailed Placement: Collaborated on developing GPU-accelerated concurrent VLSI detailed placement with CUDA; implemented and optimized global swap and parallel auction algorithm for batch-based independent-set-matching; achieved >10x speedup without quality degradation
- GPU-Accelerated VLSI Global Placement: Collaborated on high-performance VLSI analytical global placement acceleration with CUDA on GPUs; optimized wirelength and density computation operators with CUDA; developed parallel congestion map estimation for routability optimization; achieved 40x speedup in global placement
- VLSI Global Placement Algorithm: Developed multi-electrostatics-based robust VLSI placement framework DREAMPlace 3.0 with PyTorch/C++/CUDA; proposed multi-electrostatic system for optimization under fence regionconstraints; developed divergence-aware optimizer for robust nonlinear global placement; achieved >13% HPWL improvement and >11% top5 overflow reduction compared with ISPD2015 contest winners
- Efficient NN Learning and Power Optimization: Proposed efficient ONN on-chip learning framework with two-level sparse optimizer and efficient power-aware optimization; achieved high convergence stability, ~10× training efficiency improvement, and ~10× power reduction than prior methods
University of Texas at Austin, USA
- Research Assistant, ECE department, University of Texas at Austin, Sep 2018 - Jan 2019, Austin, TX
- FPGA Emulation of RISC-V Core: Projected RISC-V Rocket Core on Zynq FPGA and achieved communication between them
- Fault Injection: Customized FIRRTL transformation and built infrastructure for fault injection and system state snapshot
Fudan University, Shanghai, China
- Research Assistant, EE department, Fudan University, Aug 2017 - July 2018, Shanghai, China
- Medical Imaging Dataset: Modified infant brain atlas offered by UNC and created complete tissue probability maps
- MRI Reconstruction: Developed two-stage reconstruction framework for infant thin-section MR image reconstruction by using GANs and CNNs; research is developing brand new method to improve reconstruction performance by fusing multi-planar MR images, and improving PSNR, SSIM, and NMI by 26.2%, 93.4%, and 25.3% respectively compared to bicubic interpolation
- Ultra-sonic Image Processing: Collaborated on super-resolution reconstruction of ultra-sonic imaging using U-Net and GANs; improved the full width at half maximum (FWHM) of point targets by 3.23%

Talk and Presentation

Invited Talk, SPIE Photonics West, San Francisco, Feb, 1 2023
- Light-AI Interaction: The Convergence of Photonic AI and Cross-layer Circuit-Architecture-Algorithm Co-design
LSIP Tech Talk, Hewlett Packard Labs, Dec, 16 2022
- Light-AI Interaction: Bridging Photonics and Artificial Intelligence via Cross-Layer Circuit-Architecture- Algorithm Co-Design
ACCESS and CEDA Joint Seminar, Hong Kong, Jul 29 2022
- Light-AI Interaction: The Convergence of Photonic Deep Learning and Cross-Layer Design Automation
Invited Talk, Nvidia AI Research, Oct 12, 2022
- NeurOLight: A Physics-Agnostic Neural Operator Enabling Parametric Photonic Device Simulation
Invited Talk, Cornell Univ., Jan 19, 2022
- L2ight: Enabling On-Chip Learning for Optical Neural Networks via Efficient in-situ Subspace Optimization

Jiaqi Gu

Work Experience

Awards and Honors

Work Experience

Research Experience

Talk and Presentation