BIMSA >
BIMSA Digital Economy Lab Seminar
Textual Factors: A Scalable, Interpretable, and Data-driven Approach to Analyzing Unstructured
Textual Factors: A Scalable, Interpretable, and Data-driven Approach to Analyzing Unstructured
演讲者
时间
2025年06月13日 15:00 至 16:00
地点
A3-2a-302
线上
Zoom 637 734 0280
(BIMSA)
摘要
This paper introduce a general approach for analyzing large-scale text-based data, combining the strengths of neural network language processing and generative statistical modeling to create a factor structure of unstructured data for downstream regressions typically used in social sciences. This paper generate textual factors by (i) representing texts using vector word embedding, (ii) clustering the vectors using Locality-Sensitive Hashing to generate supports of topics, and (iii) identifying relatively interpretable spanning clusters (i.e., textual factors) through topic modeling. The data-driven approach captures complex linguistic structures while ensuring computational scalability and economic interpretability, plausibly attaining certain advantages over and complementing other unstructured data analytics used by researchers, including emergent large language models. This paper conduct initial validation tests of the framework and discuss three types of its applications: (i) enhancing prediction and inference with texts, (ii) interpreting (non-text-based) models, and (iii) constructing new text-based metrics and explanatory variables. This paper illustrate each of these applications using examples in finance and economics such as macroeconomic forecasting from news articles, interpreting multifactor asset pricing models from corporate filings, and measuring theme-based technology breakthroughs from patents. Finally, This paper provide a flexible statistical package of textual factors for online distribution to facilitate future research and applications.
演讲者介绍
Chen Liang is a PhD student at BIMSA and RUC.His research interests focus on digital economy, liquidity risk, and portfolio investment.