top of page

Longxiu Tian


Welcome! I'm an Assistant Professor of Marketing at the UNC Kenan-Flagler Business School, where I teach courses on Customer Relationship Management (CRM) and Customer Journeys at the undergraduate and MBA levels. I received my PhD from the University of Michigan in Marketing and in Scientific Computing.

My research focus is in using statistical models, particularly Bayesian econometrics, machine learning, and causal inference, to understand how consumers respond to marketing activities, and how firms should execute lifecycle marketing strategies at scale.

Interested in probabilistic machine learning, Bayesian computation, and marketing? I'm also the co-organizer of an interschool virtual reading group on these topics for junior faculty and Ph.D. students. Learn more here.




Publications and Working Papers/

Optimizing Price Menus for Duration Discounts: A Subscription Selectivity Field Experiment

Longxiu Tian and Fred M. Feinberg

Marketing Science (2020)

+Show Abstract

Subscription services typically offer duration discounts, rewarding longer commitments with lower per-period costs. The “menu” of contract plan prices must be balanced to encourage potential customers to select into subscription overall and to nudge those that do to more profitable contracts. Because subscription menu prices change infrequently, they are difficult to optimize using historical pricing data alone.

We propose that firms solve this problem via an experiment and a model that jointly accounts for whether to opt-in and, conditionally, which plan to choose. To that end, we conduct a randomized online pricing experiment that orthogonalizes the “elevation” and “steepness” of price menus for a major dating pay site. Users’ opt-in and plan choice decisions are analyzed using a novel model for correlated binary selection and multinomial conditional choice, estimated via Hamiltonian Monte Carlo. Benchmark comparisons suggest that common models of consumer choice may systematically misestimate price substitution patterns, and that a key consideration is the distinctiveness of the opt-out (e.g., non-subscription) option relative to others available.

Our model confirms several anticipated pricing effects (e.g., on the margin, raising prices discourages both opt-in overall and choice of any higher-priced plans), but also some that alternative models fail to capture, most notably that across-the-board pricing increases have a far lower negative impact than standard random-utility models would imply. Joint optimization of the price menu’s component prices suggests the firm has set them too low overall, particularly for its longest-duration plan, and adjusting the price menu accordingly should lead to roughly 10% greater revenue.


Privacy Preserving Data Fusion

Longxiu Tian, Dana Turjeman and Samuel Levy

+Show Abstract

Data fusion combines multiple datasets to make inferences that are more accurate, generalizable, and useful than those made with any single dataset alone. However, data fusion poses a privacy hazard due to the risk of revealing user identities. We propose a privacy preserving data fusion (PPDF) methodology intended to preserve user-level anonymity while allowing for a robust and expressive data fusion process. PPDF is based on variational autoencoders and normalizing flows, together enabling a highly expressive, nonparametric, Bayesian, generative modeling framework, estimated in adherence to differential privacy – the state-of-the-art theory for privacy preservation. PPDF does not require the same users to appear across datasets when learning the joint data generating process and explicitly accounts for missingness in each dataset to correct for sample selection. Moreover, PPDF is model-agnostic: it allows for downstream inferences to be made on the fused data without the analyst needing to specify a discriminative model or likelihood a priori.


We undertake a series of simulations to showcase the quality of our proposed methodology. Then, we fuse a large-scale customer satisfaction survey to the customer relationship management (CRM) database from a leading U.S. telecom carrier. The resulting fusion yields the joint distribution between survey satisfaction outcomes and CRM engagement metrics at the customer level, including the likelihood of leaving the company’s services. Highlighting the importance of correcting selection bias, we illustrate the divergence between the observed survey responses vs. the imputed distribution on the customer base. Managerially, we find a negative, nonlinear relationship between satisfaction and future account termination across the telecom carrier’s customers, which can aid in segmentation, targeting, and proactive churn management.


Learning Preference Heterogeneity from Aggregate-Response Online Experiments

Mengyao Huang and Longxiu Tian

+Show Abstract

Online field experiments, or A/B tests, that rely on traffic from web visitors, search engines, or social media platforms typically only avail of aggregate-response test results (e.g., total impressions and clicks), due to data privacy and sharing restrictions. This has limited analysts to measuring average effects in such settings, despite a rich literature in marketing on the importance of accounting for the customer-base's preference distribution. To solve this problem, we propose a hierarchical Bayesian (HB) aggregate logit model to infer within-test heterogeneity distributions across customer preferences by leveraging between-test variations across aggregate-response A/B tests. We illustrate this method using a large-scale dataset of news headline tests. To quantify the design space of tests, we decompose textual headlines via Transformer networks and provide interpretability via transfer learning from prelabeled corpora. Additionally, we exemplify the value of quantifying heterogeneity by disentangling the impact of clickbait headlines on clickthrough rate (CTR). Posterior inference across seven different operationalizations of clickbait consistently exhibit a pattern of \textit{reduced} mean and \textit{increased} variance of the impact of clickbait. Within our empirical context, this suggests so-called clickbait headlines were likely an artifact of providing journalistic "coverage" across readership's heterogeneous preference, than simply to drive CTR via sensationalist headlines.


Deep Kernel Learning for Default Prediction in Consumer Credit

Longxiu Tian, Tian Zhao, Linda Salisbury and Fred M. Feinberg

+Show Abstract

Credit scores play a vital role in reducing the risk of lending, insuring, and renting to consumers. Credit-based businesses and institutions typically rely on a portfolio of these scores, each informing a specific measure of creditworthiness, in support of decision processes such as vetting prospective customers and setting attractive risk premiums. Centrally problematic to the availability of credit scores is data missingness, which arises from incomplete or inadmissible credit file information, otherwise referred to as thin files. A quarter of U.S. consumers are impacted by thin files, manifesting as unscored gaps in the time-series of a consumer’s score histories. These gaps reduce the scores’ ability to aid in the targeting of profitable borrowers and identifying of cross-selling opportunities for financial products and services.


This paper addresses a prevalent form of unscored gaps, whereby standard credit scoring models do not emit a valid score at the consumer-month level (i.e., the highest possible granularity). To address this problem, we develop a dynamic latent factor model based on Gaussian Process regressions (GPR) to impute credible intervals for gaps in individual score histories within a portfolio of dynamically and contemporaneously interrelated scores. To tackle missingness within the high-dimensional attribute feature spaces that led to unscoreable thin files, we augment GPR with Deep Kernel Learning that enables scalable and automatic discovery of missingness patterns, which are not presumed a priori to be random. We apply this model to novel data from a leading credit bureau on scores for a segment of the U.S. population, along with attributes derived from credit file used to generate the scores by the credit bureaus.


Ann Arbor, MI


Stephen M. Ross School of Business
Ph.D. in Marketing and Scientific Computing

Dissertation: Bayesian Nonparametrics for Marketing Response Models (Chair: Fred M. Feinberg)


Cambridge, MA


Sloan School of Management
Master in Finance

Evanston, IL


Weinberg College of Arts and Sciences

B.A. in Economics, M.S. in Information Systems

bottom of page