Beijing Institute of Mathematical Sciences and Applications Beijing Institute of Mathematical Sciences and Applications

  • About
    • President
    • Governance
    • Partner Institutions
    • Visit
  • People
    • Management
    • Faculty
    • Postdocs
    • Visiting Scholars
    • Staff
  • Research
    • Research Groups
    • Courses
    • Seminars
  • Join Us
    • Faculty
    • Postdocs
    • Students
  • Events
    • Conferences
    • Workshops
    • Forum
  • Life @ BIMSA
    • Accommodation
    • Transportation
    • Facilities
    • Tour
  • News
    • News
    • Announcement
    • Downloads
About
President
Governance
Partner Institutions
Visit
People
Management
Faculty
Postdocs
Visiting Scholars
Staff
Research
Research Groups
Courses
Seminars
Join Us
Faculty
Postdocs
Students
Events
Conferences
Workshops
Forum
Life @ BIMSA
Accommodation
Transportation
Facilities
Tour
News
News
Announcement
Downloads
Qiuzhen College, Tsinghua University
Yau Mathematical Sciences Center, Tsinghua University (YMSC)
Tsinghua Sanya International  Mathematics Forum (TSIMF)
Shanghai Institute for Mathematics and  Interdisciplinary Sciences (SIMIS)
BIMSA > Advances in Artificial Intelligence The Transformer Model and its Cutting-edge Research
The Transformer Model and its Cutting-edge Research
Organizer
Ming Ming Sun
Speaker
Hai Hua Xie
Time
Thursday, September 26, 2024 3:00 PM - 5:00 PM
Venue
A3-1-301
Online
Zoom 361 038 6975 (BIMSA)
Abstract
The report offers a comprehensive overview of the Transformer architecture, which powers many contemporary large language models. It delves into the fundamental principles and computational processes that contribute to the Transformer’s prominence in AI. The analysis includes discussions of key weaknesses and potential improvements from a mathematical standpoint. Additionally, it highlights innovations like Aaren, WideFFN, Infini-attention, ALiBi, Roformer, and Reformer, presenting how these advancements aim to address current limitations and enhance model performance.

References:
1) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: Attention is All you Need. NIPS 2017: 5998-6008.
2) Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.
3) Mann, Ben, et al. "Language models are few-shot learners." arXiv preprint arXiv:2005.14165 1 (2020).
4) Yin, Zi, and Yuanyuan Shen. "On the dimensionality of word embedding." Advances in neural information processing systems 31 (2018).
5) Su, Jianlin, et al. "Roformer: Enhanced transformer with rotary position embedding." Neurocomputing 568 (2024): 127063.
6) Press, Ofir, Noah A. Smith, and Mike Lewis. "Train short, test long: Attention with linear biases enables input length extrapolation." arXiv preprint arXiv:2108.12409 (2021).
7) Liu, Nelson F., et al. "Lost in the middle: How language models use long contexts." Transactions of the Association for Computational Linguistics 12 (2024): 157-173.
8) Munkhdalai, Tsendsuren, Manaal Faruqui, and Siddharth Gopal. "Leave no context behind: Efficient infinite context transformers with infini-attention." arXiv preprint arXiv:2404.07143 (2024).
9) Kitaev, Nikita, Łukasz Kaiser, and Anselm Levskaya. "Reformer: The efficient transformer." arXiv preprint arXiv:2001.04451 (2020).
10) Pires, Telmo Pessoa, et al. "One wide feedforward is all you need." arXiv preprint arXiv:2309.01826 (2023).
11) Feng, Leo, et al. "Attention as an RNN." arXiv preprint arXiv:2405.13956 (2024).
Speaker Intro
Dr. Haihua Xie receives a Ph.D. in Computer Science at Iowa State University in 2015. Before joining BIMSA in Oct. 2021, Dr. Xie worked in the State Key Lab of Digital Publishing Technology, Peking University from 2015-2021. His research interests include Natural Language Processing and Knowledge Service. He published more than 20 papers and obtained 7 invention patents. In 2018, Dr. Xie was selected in the 13th batch of overseas high-level talents in Beijing and was hornored as a "Beijing Distinguished Expert".
Beijing Institute of Mathematical Sciences and Applications
CONTACT

No. 544, Hefangkou Village Huaibei Town, Huairou District Beijing 101408

北京市怀柔区 河防口村544号
北京雁栖湖应用数学研究院 101408

Tel. 010-60661855
Email. administration@bimsa.cn

Copyright © Beijing Institute of Mathematical Sciences and Applications

京ICP备2022029550号-1

京公网安备11011602001060 京公网安备11011602001060