Shitian Zhao (赵世天)

Now I am a research intern in Shanghai AI Lab, Alpha-VLLM team, in my gap year, mentored by Peng Gao. I got my bachelor degree in the summer of 2024 from East China Normal University, supervised by Professor Yan Wang. And I was an intern in CCVL@Johns Hopkins University, supervised by Bloomberg Distinguished Professor Alan Yuille and Zhuowan Li.

Email: zhaosh5t5an@gmail.com

Wechat: zstt0135

CV  /  G-Scholar  /  S-Scholar  /  Twitter  /  LinkedIn  /  Github  /  HuggingFace  /  Zhihu(知乎)

profile photo

"A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable."

--- 《Deep Unsupervised Learning using Nonequilibrium Thermodynamics》

Application: I am currently in the process of applying for a PhD program in Artificial Intelligence for 2025fall. If you are also applying or are interested in talking about research, please do not hesitate to contact me! I am more than happy to connect and share insights.

Research

My Long-Term Research Interests include:

  • Multi-modality Generative Models: Modeling multi-modal content in the generative way, under various paradigms, e.g., autoregressive model, diffusion model, etc.
    Lumina-mGPT, SPHINX-X
  • Inferece Time Scaling Law: More computation in the inference time results in higher intellectual level of the AI system.
    Causal-CoG, Likelihood Composition
  • Compositionality: Decomposing the images or unstructured text to help reasoning and planning.

News

  • [Sept. 2024] Likelihood Composition is accepted by EMNLP 2024 as findings.
  • [May 2024] Give a talk of Causal-CoG at CCVL@JHU's group meeting.
  • [April 2024] Causal-CoG is accepted by CVPR 2024 as Poster (Highlight, top 2.8%).
  • [Dec. 2023] Join Shanghai AI Lab as a research intern.

Last updated: 2024/9/26.


Publications

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Shitian Zhao, Zhuowan Li, Yadong Lu, Alan Yuille, Yan Wang
CVPR (Poster Highlight, top 2.8%), 2024
arXiv / code

Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
Shitian Zhao, Renrui Zhang, Xu Luo, Yan Wang, Shanghang Zhang, Peng Gao
EMNLP Findings, 2024
arXiv / code

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu*, Shitian Zhao*, Le Zhuo*, Weifeng Lin*, Hongsheng Li, Yu Qiao, Peng Gao*
Preprint
arXiv / code

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin*, Xinyu Wei*, Renrui Zhang*, Le Zhuo, Shitian Zhao, Siyuan Huang, Junlin Xie, Yu Qiao, Peng Gao, Hongsheng Li
ICLR, 2025
arXiv / code

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Dongyang Liu*, Longtian Qiu*, Siyuan Huang*, Weifeng Lin*, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao, Peng Gao*
ICML, 2024
arXiv / code

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Jiayi Lei*, Renrui Zhang*, Xiangfei Hu, Weifeng Lin, Zhen Li, Wenjian Sun, Ruoyi Du, Le Zhuo, Zhongyu Li, Xinyue Li, Shitian Zhao, Ziyu Guo, Yiting Lu, Peng Gao, Hongsheng Li
Preprint, 2024
arXiv

Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype
Yadong Lu, Shitian Zhao, Boxiang Yun, Dongsheng Jiang, Yin Li, Qingli Li, Yan Wang
Preprint, 2024
arXiv


Professional Service

  • Reviewer at NeurIPS2024, CVPR2025.

Talks & Presentations


Honors & Awards

  • Outstanding Graduate Thesis @ ECNU
  • Excellent Student Scholarship (the first year in my undergraduate journey)
  • The Third Prize, Province Level, China Undergraduate Mathematical Contest in Modeling

Misc

  • I used to be an actor @ Yang Zhi Shui Chinese Drama Club, ECNU. Our drama, "Online Tragedy" (《线上悲剧》), won the Best Online Creative Award at The 18th Shanghai College Students Drama Festival. In the play, my role is the male lead, Andrei (安德烈), a Russian revolutionary youth.
  • During the summer of 2021, I joined the ecology research team from ECNU on multiple field investigations to Zhoushan Island in Zhejiang Province, primarily focusing on studying the distribution of local anteater populations. Finally, this research is published as a paper.
© Shitian Zhao

Website theme stolen from Jon Barron.