Beijing Institute of Mathematical Sciences and Applications Beijing Institute of Mathematical Sciences and Applications

  • About
    • President
    • Governance
    • Partner Institutions
    • Visit
  • People
    • Management
    • Faculty
    • Postdocs
    • Visiting Scholars
    • Administration
    • Academic Support
  • Research
    • Research Groups
    • Courses
    • Seminars
  • Join Us
    • Faculty
    • Postdocs
    • Students
  • Events
    • Conferences
    • Workshops
    • Forum
  • Life @ BIMSA
    • Accommodation
    • Transportation
    • Facilities
    • Tour
  • News
    • News
    • Announcement
    • Downloads
About
President
Governance
Partner Institutions
Visit
People
Management
Faculty
Postdocs
Visiting Scholars
Administration
Academic Support
Research
Research Groups
Courses
Seminars
Join Us
Faculty
Postdocs
Students
Events
Conferences
Workshops
Forum
Life @ BIMSA
Accommodation
Transportation
Facilities
Tour
News
News
Announcement
Downloads
Qiuzhen College, Tsinghua University
Yau Mathematical Sciences Center, Tsinghua University (YMSC)
Tsinghua Sanya International  Mathematics Forum (TSIMF)
Shanghai Institute for Mathematics and  Interdisciplinary Sciences (SIMIS)
BIMSA > BIMSA Digital Economy Lab Seminar Textual Factors: A Scalable, Interpretable, and Data-driven Approach to Analyzing Unstructured
Textual Factors: A Scalable, Interpretable, and Data-driven Approach to Analyzing Unstructured
Organizers
Ruize Gao , Liyan Han , Zhen Li , Fei Long , Ke Tang
Speaker
Chen Liang
Time
Friday, June 13, 2025 3:00 PM - 4:00 PM
Venue
A3-2a-302
Online
Zoom 637 734 0280 (BIMSA)
Abstract
This paper introduce a general approach for analyzing large-scale text-based data, combining the strengths of neural network language processing and generative statistical modeling to create a factor structure of unstructured data for downstream regressions typically used in social sciences. This paper generate textual factors by (i) representing texts using vector word embedding, (ii) clustering the vectors using Locality-Sensitive Hashing to generate supports of topics, and (iii) identifying relatively interpretable spanning clusters (i.e., textual factors) through topic modeling. The data-driven approach captures complex linguistic structures while ensuring computational scalability and economic interpretability, plausibly attaining certain advantages over and complementing other unstructured data analytics used by researchers, including emergent large language models. This paper conduct initial validation tests of the framework and discuss three types of its applications: (i) enhancing prediction and inference with texts, (ii) interpreting (non-text-based) models, and (iii) constructing new text-based metrics and explanatory variables. This paper illustrate each of these applications using examples in finance and economics such as macroeconomic forecasting from news articles, interpreting multifactor asset pricing models from corporate filings, and measuring theme-based technology breakthroughs from patents. Finally, This paper provide a flexible statistical package of textual factors for online distribution to facilitate future research and applications.
Speaker Intro
Chen Liang is a PhD student at BIMSA and RUC.His research interests focus on digital economy, liquidity risk, and portfolio investment.
Beijing Institute of Mathematical Sciences and Applications
CONTACT

No. 544, Hefangkou Village Huaibei Town, Huairou District Beijing 101408

北京市怀柔区 河防口村544号
北京雁栖湖应用数学研究院 101408

Tel. 010-60661855 Tel. 010-60661855
Email. administration@bimsa.cn

Copyright © Beijing Institute of Mathematical Sciences and Applications

京ICP备2022029550号-1

京公网安备11011602001060 京公网安备11011602001060