Shitian Zhao

Shitian Zhao (赵世天)

Now I am a research intern in Shanghai AI Lab, in my gap year, mentored by Kaipeng Zhang. Also, I work closely with Professor Chen Wei.
I got my bachelor degree in the summer of 2024 from East China Normal University, supervised by Professor Yan Wang. And I was an intern in CCVL@Johns Hopkins University, supervised by Bloomberg Distinguished Professor Alan Yuille and Zhuowan Li.

Email: zhaosh5t5an@gmail.com

CV / G-Scholar / Twitter / LinkedIn / Github / HuggingFace /

"A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable."

--- 《Deep Unsupervised Learning using Nonequilibrium Thermodynamics》

Application: I am currently in the process of applying for a PhD program in Artificial Intelligence for 2026fall. If you are also applying or are interested in talking about research, please do not hesitate to contact me! I am more than happy to connect and share insights.

Research

My Long-Term Research Interests include:

Multi-modality Generative Models: Modeling multi-modal content in the generative way, under various paradigms, e.g., autoregressive model, diffusion model, etc.
Lumina-mGPT, SPHINX-X
General Instructable Agent: AI agents that can perceive and understand a variety of environments, then take actions to achieve an instructed goal.
Causal-CoG, Likelihood Composition, PyVision
AI4Art: AI for Art, Media, and Entertainment.LeX-Art, X-ART

News

[Oct. 2025] PyVision is received as Poster by MTI-LLM Workshop@NeurIPS 2025.
[March. 2025] We released LeX-Art, a framework for aesthetic, high fidelity text image generation. Data, code, model, benchmark have been open-sourced.
[Sept. 2024] Likelihood Composition is accepted by EMNLP 2024 as findings.
[May 2024] Give a talk of Causal-CoG at CCVL@JHU's group meeting.
[April 2024] Causal-CoG is accepted by CVPR 2024 as Poster (Highlight, top 2.8%).
[Dec. 2023] Join Shanghai AI Lab as a research intern.

Publications

	PyVision: Agentic Vision with Dynamic Tooling Shitian Zhao, Haoquan Zhang, Shaoheng Lin, Ming Li, Qilong Wu, Kaipeng Zhang, Chen Wei, MTI-LLM@NeurIPS*, 2025 arXiv / code / project page / Online Demo
	Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models Shitian Zhao, Zhuowan Li, Yadong Lu, Alan Yuille, Yan Wang CVPR (Poster Highlight, top 2.8%), 2024 arXiv / code
	Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models Shitian Zhao, Renrui Zhang, Xu Luo, Yan Wang, Shanghang Zhang, Peng Gao EMNLP Findings, 2024 arXiv / code
	LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis Shitian Zhao, Qilong Wu, Xinyue Li, Bo Zhang, Ming Li, Qi Qin, Dongyang Liu, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Peng Gao, Bin Fu, Zhen Li, Preprint*, 2025 arXiv / code / project page / huggingface
	Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Dongyang Liu, Shitian Zhao, Le Zhuo, Weifeng Lin, Hongsheng Li, Yu Qiao, Peng Gao* Preprint arXiv / code
	PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Weifeng Lin, Xinyu Wei, Renrui Zhang, Le Zhuo, Shitian Zhao, Siyuan Huang, Junlin Xie, Yu Qiao, Peng Gao, Hongsheng Li ICLR*, 2025 arXiv / code
	SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models Dongyang Liu, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao, Peng Gao* ICML, 2024 arXiv / code

Professional Service

Reviewer at NeurIPS2024, CVPR2025.

Talks & Presentations

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models, Group meeting @ CCVL, Johns Hopkins University
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining, Group meeting @ CCVL, Johns Hopkins University

Honors & Awards

Outstanding Graduate Thesis @ ECNU
Excellent Student Scholarship (the first year in my undergraduate journey)
The Third Prize, Province Level, China Undergraduate Mathematical Contest in Modeling

Misc

I used to be an actor @ Yang Zhi Shui Chinese Drama Club, ECNU. Our drama, "Online Tragedy" (《线上悲剧》), won the Best Online Creative Award at The 18th Shanghai College Students Drama Festival. In the play, my role is the male lead, Andrei (安德烈), a Russian revolutionary youth.
During the summer of 2021, I joined the ecology research team from ECNU on multiple field investigations to Zhoushan Island in Zhejiang Province, primarily focusing on studying the distribution of local anteater populations. Finally, this research is published as a paper.

Website theme stolen from Jon Barron.