Application: I am currently in the process of applying for a PhD program in Artificial Intelligence for 2025fall. If you are also applying or are interested in talking about research, please do not hesitate to contact me! I am more than happy to connect and share insights.
|
Research
My Long-Term Research Interests include:
- Multi-modality Generative Models: Modeling multi-modal content in the generative way, under various paradigms, e.g., autoregressive model, diffusion model, etc.
Lumina-mGPT, SPHINX-X
- Inferece Time Scaling Law: More computation in the inference time results in higher intellectual level of the AI system.
Causal-CoG, Likelihood Composition
- Compositionality: Decomposing the images or unstructured text to help reasoning and planning.
|
News
- [Sept. 2024] Likelihood Composition is accepted by EMNLP 2024 as findings.
- [May 2024] Give a talk of Causal-CoG at CCVL@JHU's group meeting.
- [April 2024] Causal-CoG is accepted by CVPR 2024 as Poster (Highlight, top 2.8%).
- [Dec. 2023] Join Shanghai AI Lab as a research intern.
Last updated: 2024/9/26.
|
|
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin*,
Xinyu Wei*,
Renrui Zhang*,
Le Zhuo,
Shitian Zhao,
Siyuan Huang,
Junlin Xie,
Yu Qiao,
Peng Gao,
Hongsheng Li
Preprint, 2024
arXiv /
code
|
|
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Shitian Zhao,
Zhuowan Li,
Yadong Lu,
Alan Yuille,
Yan Wang
CVPR (Poster Highlight, top 2.8%), 2024
arXiv /
code
|
|
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu*,
Shitian Zhao*,
Le Zhuo*,
Weifeng Lin*,
Hongsheng Li,
Yu Qiao,
Peng Gao*
Preprint
arXiv /
code
|
|
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Peng Gao*,
Renrui Zhang*,
Chris Liu*,
Longtian Qiu*,
Siyuan Huang*,
Weifeng Lin*,
Shitian Zhao,
Shijie Geng,
Ziyi Lin,
Peng Jin,
Kaipeng Zhang,
Wenqi Shao,
Chao Xu,
Conghui He,
Junjun He,
Hao Shao,
Pan Lu,
Hongsheng Li,
Yu Qiao
ICML, 2024
arXiv /
code
|
|
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
Shitian Zhao,
Renrui Zhang,
Xu Luo,
Yan Wang,
Shanghang Zhang,
Peng Gao
EMNLP Findings, 2024
arXiv /
code
|
|
SPHINX and ANUBIS: Alleviating Deterministic and Generative Hallucinations of Multi-modal Large Language Models
Shitian Zhao,
Han Xiao,
Le Zhuo,
Xu Luo,
Hongsheng Li,
Yu Qiao,
Xiangyu Yue,
Peng Gao
Submitted, 2024
|
|
Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype
Yadong Lu,
Shitian Zhao,
Boxiang Yun,
Dongsheng Jiang,
Yin Li,
Qingli Li,
Yan Wang
Preprint, 2024
arXiv
|
Services
Reviewer at NeurIPS2024.
|
Honors & Awards
Outstanding Graduate Thesis at the University Level
|
|