BIMSA >
BIMSA Digital Economy Lab Seminar
Textual Factors: A Scalable, Interpretable, and Data-driven Approach to Analyzing Unstructured
Textual Factors: A Scalable, Interpretable, and Data-driven Approach to Analyzing Unstructured
Speaker
Time
Friday, June 13, 2025 3:00 PM - 4:00 PM
Venue
A3-2a-302
Online
Zoom 637 734 0280
(BIMSA)
Abstract
This paper introduce a general approach for analyzing large-scale text-based data, combining the strengths of neural network language processing and generative statistical modeling to create a factor structure of unstructured data for downstream regressions typically used in social sciences. This paper generate textual factors by (i) representing texts using vector word embedding, (ii) clustering the vectors using Locality-Sensitive Hashing to generate supports of topics, and (iii) identifying relatively interpretable spanning clusters (i.e., textual factors) through topic modeling. The data-driven approach captures complex linguistic structures while ensuring computational scalability and economic interpretability, plausibly attaining certain advantages over and complementing other unstructured data analytics used by researchers, including emergent large language models. This paper conduct initial validation tests of the framework and discuss three types of its applications: (i) enhancing prediction and inference with texts, (ii) interpreting (non-text-based) models, and (iii) constructing new text-based metrics and explanatory variables. This paper illustrate each of these applications using examples in finance and economics such as macroeconomic forecasting from news articles, interpreting multifactor asset pricing models from corporate filings, and measuring theme-based technology breakthroughs from patents. Finally, This paper provide a flexible statistical package of textual factors for online distribution to facilitate future research and applications.
Speaker Intro
Chen Liang is a PhD student at BIMSA and RUC.His research interests focus on digital economy, liquidity risk, and portfolio investment.