Understanding and Augmenting the Reasoning Capacities of Large Language Models


Understanding and Augmenting the Reasoning Capacities of Large Language Models

Principal Investigator

Stanford University

Oracle Fellowship Recipient

Ben Prystawski, Michael Li

Oracle Principal Investigator

Jason Peck, Research Director
Swetasudha Panda, Senior Member of Technical Staff


This research seeks to develop a fundamental understanding of reasoning in state-of-the-art language models and apply that understanding to improve how we use language models in open-ended systems. In particular, we will explore how different forms of statistical structure influence data-efficiency, or how much training data is required to achieve a given reasoning capability, and the role that the model architecture plays in the emergence of reasoning capabilities. Our proposed work could lead to more data-efficient models that can achieve the same reasoning capabilities as current models but with significantly less training data and practical methods for predicting when reasoning will be effective.

Reasoning capabilities emerge in large-scale models trained on large-scale datasets but training these models is extremely resource intensive. Motivated by our prior research showing that local statistical structure can improve data-efficiency and reasoning, we will develop and test new data augmentation techniques that segment text into local neighborhoods based on different measures of locality. In addition to yielding insights into how to train models with greater data-efficiency, our research has another practical application. If we can systematically identify which locality measures correlate best with downstream reasoning performance, we can potentially predict the problem domains where reasoning can be most effectively applied. In addition to local statistical structure, we will explore how other forms of statistical structure, such as hierarchical structure, impact data-efficiency and reasoning.

We will also study the role of model architecture on reasoning and data efficiency, with a particular focus on state-space models (SSMs), a promising alternative that has exceeded the performance of transformers on many natural language tasks. The reasoning capabilities and data efficiency of SSMs have not been comprehensively characterized. By comparing the performance of SSMs and transformers across various reasoning benchmarks, we aim to shed light on the architectural factors that contribute to effective reasoning.