From Diffusion Model to Autoregressive––Thoughts on the Future of the World Model

Organizers

Mingming Sun , Yaqing Wang

Speaker

Xinyu Xiao

Time

Friday, December 13, 2024 3:00 PM - 4:00 PM

Venue

Online

Zoom 787 662 9899 (BIMSA)

Abstract

From images to videos, diffusion models are demonstrating their application value in video generation. This is due to their powerful randomness and realism, which allow them to capture subtle dynamic changes, making the generated videos more authentic. Meanwhile, autoregressive models have quickly become a research hotspot in the field of video generation because of their advantages in sequence generation. They show great potential for generating smoother and more coherent videos. Furthermore, with the enhancement of computational power and optimization of model architectures, autoregressive models are continually improving in terms of generation efficiency and quality. The speaker will delve into the current advancements in image and video generation technologies by combining their cutting-edge research work in the video generation field with classical works in the area. Additionally, based on the development of visual generation and understanding, the prospects for world models are intriguing. The speaker will also discuss the research prospects and directions of world models based on current research progress. This presentation will be conducted in Chinese.

Speaker Intro

Xinyu Xiao graduated with a Bachelor's degree from Beihang University (Beijing University of Aeronautics and Astronautics) and earned a Doctorate from the Institute of Automation, Chinese Academy of Sciences. Currently, he is engaged in artificial intelligence research and development within the industrial sector, with his primary research focusing on visual understanding and generation. This encompasses various aspects such as visual description, retrieval, weather forecasting, visual generation, recognition, detection, visual question answering, reinforcement learning, contrastive learning, interpretable learning, and spatiotemporal data mining. To date, he has published more than 20 papers.