Y. Wang’s Blog

Yen-Shi Wang

––– continual learning as AI does

Experiences in deep learning inference, compilers, and performance optimization. I found interest in learning new LLM model operators, doing performace turning for both CPU and GPU, and learning how compilers work in general.
Here is my Curriculum Vitae .

Experience

Senior Deep Learning SWE at Nvidia

Sep. 2024 - Present

As a core developer of TensorRT for RTX from project initiation, oversee the development, testing and validation, infra setup and release. Designed runtime fusion optimization and trimmed down library size from 1GB to be under 200MB. Made just-in-time compilation workflow possible and brought optimized deep learning inference closer to users of NVIDIA RTX GPUs.

Senior Deep Learning SWE at Nvidia

Jul. 2022 - Sep. 2024

Implemented multi-device TensorRT runtime in C++ using NCCL to support inference with multiple nodes. Wrote TensorRT v9.1.0 NeMo demo from scratch in Python, which supports FP8 and FP16 inference on GPT-3 style LLM network. Added FP8 inference support for NeMo and profiled demo runtime and improved generation decoding latency by 30%.

Deep Learning SWE at Nvidia

Mar. 2021 - Jun. 2022

Be point of contact for TensorRT Consistency Checker and implemented deconvolution node validation from scratch. Improved all safety layer tests, performed code inspection, and went through verification to confrom ISO 26262 safety standard. Debug C++ code with GDB and fix software bugs caused by use after free, random failures, and type errors.

Performance SWE Intern at Nvidia

May. 2020 - Aug. 2020

Optimized C++ multithreading server for MLPerf Inference v0.7 BERT, used Nsight Systems to analyze CPU/GPU performance bugs, and utilized CUDA streams and graphs to boost throughput (Details cannot be disclosed before the results are public.) Actively involved in code review and group discussion in a remote working environment.

C++ Developer at Skymizer

Apr. 2019 - Jul. 2019

I participated in ONNC project, and implemented 21 optimization passes (code) for trained classification models. After completing essential optimizations and getting familiarity with NVDLA, I initiated quantization framework for ONNC to translate 32-bit FP models into 8-bit INT and fill in all parameters of converters described here.

Software Engineer at BravoAI

Mar. 2018 - Sep. 2018

My responsibility was to develop an optical character recognition (OCR) system for insurance company to transform medical certificates from paper into digital format automatically. With neural networks written using Pytorch, the entire system was deployed by four Docker containers running Flask web service, operating at a speed of 0.5 image/sec.

Quantitative Research Intern at WorldQuant

Jul. 2017 - Mar. 2018

As a quant, we create models from financial data which predict movement of stock markets, and submitting those models to Websim simulation platform. I ended up with highest Websim score among six interns, and continued to work as a consultant. I submitted over 200 models and ranked 3rd among all Taiwanese consultants at the end.

Projects

Twitter Data Analysis, CMU Cloud Computing

Fall 2019

We need to apply ETL process on 1TB raw data, store the results in database, and deploy a service which responds to specific query. It's a team of 3. I configured both MySQL and HBase according to different tasks and deployed 8 web instances and 1 load balancer on AWS EC2. We then further migrated previous service to AWS ECS/RDS/Lightsail.

Mail Service, NTU Network Administration

Spring 2017

It's a simple and complete web service, from frontend, storage, to user account management. I setup an LDAP database on both Postfix (SMTP) and Dovecot (IMAP) servers to provide user authentication information. We also utilized Ansible to deploy entire service to remote instances.

Publications

[1] Solving Exist-Random Quantified Stochastic Boolean Satisfiability via Clause Selection. IJCAI, 2018.
*N.-Z. Lee, *Yen-Shi Wang, and J.-H. R. Jiang. * Equal contribution

[2] Solving Stochastic Boolean Satisfiability under Random-Exist Quantification. IJCAI, 2017.
N.-Z. Lee, Yen-Shi Wang, and J.-H. R. Jiang.