Prediction of demand elasticity for different markets based on average airfare change

Yiyue Xu, Anthony Yang, Ke Xu, Wencheng Zhang, Yuwen Meng, Hao Liang 

Problem Description

The ability to accurately predict changes in demand in response to fare alterations is of paramount importance in the travel industry. It allows airlines to optimize their pricing strategies and revenue management systems. Historically, the demand estimation was solely considering passenger volume and average fare of different combinations of cabins and the number of connections. In this project, we have broadened the scope of these predictions, weaving together diverse data sets to generate more robust and comprehensive demand elasticity models. Understanding the key drivers of demand fluctuations and whether they are universal or specific to different markets has been a crucial facet of this research.  

We aimed to construct a model that could predict how changes in average fares impact the demand for different markets. Our work examined these relationships on a market, cluster, and global level, enabling us to tailor our recommendations to the specific characteristics and requirements of different markets and clusters. By leveraging a blend of data sets, including historical booking data and exogenous data sources, we were able to develop a nuanced understanding of the multifaceted factors that drive demand. This comprehensive approach has led to the development of a model that not only predicts demand elasticity but also provides actionable insights for airlines to strategize their market entry and response to new competitors.  

Data Description 

Internal data 

The internal dataset provided by Sabre Corporation for this project comprises a rich assortment of information across several key fields, which are critical to understanding demand elasticity in various markets. Each row in the dataset represents a unique booking, with specific details about the booking and the market in which it falls. The dataset represents bookings for 18 different markets and spans a considerable time frame, with the departure date ranging from ‘2015-01-02’ to ‘2023-12-15’. However, it is important to note that there is a gap in the data for the entire year of 2020 due to COVID-19, where the booking date information is missing. This gap poses a challenge for the analysis, and the strategy to account for this missing data is a part of our methodology. 

External Data 

The external data for this project consists of a variety of metrics that provide additional context and insights to complement the internal booking data from Sabre Corporation. This data includes both air travel volumes and key economic indicators. The project also incorporates several key economic indicators gathered from official sources, including CPI (Consumer Price Index), PMI (Purchasing Managers’ Index) and Exchange Rate that could affect the affordability of travel and international travel demand. We also incorporated additional external data points focusing on geographic1 and economic2 3 factors, as well as tourism4 data to better cluster the markets. The integration of these additional external data sources into our analysis provides a more nuanced and comprehensive understanding of the similarities among markets and helps categorize the markets into distinct clusters. 

Exploratory Data Analysis  

By visualizing and summarizing the data, we can gain insights, identify trends, and detect potential issues that may inform our modeling and prediction efforts. Image 1 presents a global view of our nine selected routes, comprising three domestic US and six international round-trip flights. Image 2 reveals that the markets most significantly impacted by the pandemic are domestic flights within the US. Image 3 displays airline market shares for nine round-trip routes, aggregating airlines with less than 20% market share into “Others”. The visualization reveals that for most of these routes, one or two airlines predominantly dominate the market. 

Image 1. The view of nine selected routes.

Image 1. The view of nine selected routes. 

Image 2. Markets impacted by the pandemic.

Image 2. Markets impacted by the pandemic. 

Image 3. Airline market shares 

Image 3. Airline market shares 

Image 4. Passenger volume. Image 4. Passenger volume. 

The passenger volume shows significant variation between the pre-COVID period and the time during and after the pandemic (Image 4). This discrepancy is partially due to the unavailability of booking data for 2020 and the inherent changes in travel patterns caused by the COVID-19 pandemic.  

Preprocessing 

Data Cleaning 

Data cleaning process performed three core tasks: convert data types, remove negative values, and eliminate outliers, which can substantially distort the statistical metrics and lead to misleading conclusions. Outliers are identified as the top 10% of ‘AvgFare’ by the cabin for each market. This approach allows us to account for inherent variability between different markets and cabin types. We aim to strike a balance between retaining maximum data for robust analysis while also curbing the influence of extreme values that could distort our findings. 

Feature Engineering 

Feature engineering is an indispensable process that involves creating new features from existing data to improve the performance of models. By extracting more information from the data, we can provide the models with additional signals, enhancing their capacity to learn and make predictions. A set of new features from our cleaned dataset were generated by transforming and categorizing existing features in ways that highlight the factors most relevant to our analysis. By doing this, we enabled our models to capture more complex patterns and relationships within the data, ultimately leading to more robust and accurate predictions. The engineered features are chosen to represent a broad range of factors, from timing and logistics to market competition, allowing us to model the intricacies of the airline industry comprehensively. 

Data Aggregation 

In this project, data aggregation plays a pivotal role in aiding us to examine and model our data at multiple levels of granularity. Various versions of the dataset were created according to a set of predefined feature selection strategies and aggregation methods. 

Data Grouping Process 

We define a grouping as a combination of a feature selection strategy and an aggregation method. Hence, with three feature selection strategies and four aggregation methods, we form twelve distinct groupings. Each grouping provides a unique view of the data, capturing different levels of detail and different combinations of factors. 

This systematic and flexible approach to data grouping offers a way to generate a multifaceted understanding of our data. We can scrutinize how different factors influence airline fares under various grouping methods, leading us to more accurate and robust predictive models. By analyzing results across different groupings, we can identify the most informative level of data aggregation and discover key drivers of airline fares, offering valuable insights for strategic decision-making. 

Data Scope Adjustment: Pre-Covid and Full Data Options 

To enhance the flexibility and applicability of our model, we introduce a data scope adjustment feature that allows users to tailor the data inputs according to their specific needs or analytical objectives. This feature enables users to choose between the full dataset and the pre-Covid subset for their modeling process. By offering these data scope adjustment options, we ensure that our model can cater to a wide range of analytical goals and research questions, from understanding long-term industry trends to investigating the impacts of significant global events like the Covid-19 pandemic. 

Cross Validation 

The nature of our project requires us to carefully consider the unique characteristics of our dataset when implementing model validation techniques. Cross-validation, a widely used method for assessing the robustness and generalizability of models, is a crucial part of our analysis. However, we can’t use it in its traditional form due to the temporal structure of our data. Here, time plays a vital role, and the order of the data points cannot be shuffled as it may lead to data leakage – using future data to predict the past. 

To circumvent this, we resort to Time Series Cross-Validation, a specialized form of cross-validation designed for time-dependent data. This technique respects the temporal order of observations, which is crucial for our airline booking data. The primary advantage of this method is that it prevents overfitting and provides a more accurate measure of model generalizability. 

In the Time Series Cross-Validation process, the data is divided into a number of folds. Unlike traditional cross-validation, the folds are not created by random sampling but by moving a time window across the dataset. For each iteration, the model is trained on the data within the time window and validated on the data following this window. 

This approach aligns with the inherent diversity and complexity of our data, reflecting the variable patterns and trends present across various granularity levels of data. By tuning and validating our models in this way, we are better equipped to understand the data’s underlying structure and dynamics and can provide more accurate and insightful forecasts. 

 

Model Summary 

  1. Linear Regression: Linear regression is a basic predictive analytics technique. It is used to explain the relationship between one dependent variable (the variable you’re predicting) and one or more independent variables (the variables used to make the prediction). The output of a linear regression model is a straight line that best fits the distribution of the provided data. Linear regression is a good baseline model as it is simple, interpretable, and fast to train. 
  1. Poisson Regression: Poisson regression is a type of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable has a Poisson distribution and uses a log link function. 
  1. Random Forest: Random Forest is an ensemble learning method that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. This model is known for its flexibility, ease of use, and robustness to overfitting. The “randomness” in its name comes from two aspects: it randomly samples data points to build trees and it randomly sample features for each split during the tree building process. Random forest can capture non-linear relationships and provides a good balance between simplicity and prediction power. It also handles missing values and categorical features well. 
  1. XGBoost: eXtreme Gradient Boosting or XGBoost, is a powerful machine learning algorithm based on the gradient boosting framework, which is a way of combining multiple weak prediction models, typically decision trees, into a stronger model. XGBoost is designed to be highly efficient, flexible, and portable. It provides parallel tree boosting and has a variety of regularization parameters which help prevent overfitting. XGBoost can potentially provide more predictive power than Random Forest. 
  1. Neural Network: Neural networks are a class of models within the general machine learning literature. They are inspired by biological neurons and the connections between them. Neural networks are composed of layers of interconnected nodes, or “neurons”, with each layer using the output from the previous layer as its input. This composition allows neural networks to model complex, non-linear relationships. Neural networks have been used successfully in a variety of applications, from image and speech recognition to natural language processing and airline ticket price prediction. Neural networks, especially deep learning models, are the most flexible and can potentially capture complex patterns and interactions in the data, but they can be more computationally intensive to train and require more data. 

 

Performance Analysis 

As mentioned before in the model pipeline section, all 5 model types are trained to fit 13 different group aggregations of the data. Afterwards, we train the models on all granularity levels (full/precovid, cluster/market/global), and performance statistics: RMSE, MAPE, and SMAPE are calculated to analyze the models’ performances. 

Formulas for the Performance Metrics 

The main evaluation criteria used here is SMAPE. The reason is that SMAPE combats the asymmetries and large value bias that comes with MAPE, and SMAPE is also relatively interpretable when the value is small. 

 

Overall Performance Analysis 

All model results from all three pipelines are being recorded for the purpose of comparison and analysis. Different grouping methods introduce different datasets to different models; consequently, model results are only comparable within but not across different grouping methods. To minimize the possibility of comparing model results across differently grouped datasets, averages of SMAPE were taken for every model and grouping method combination in order to find the lowest SMAPE average in turn the best performing combination.  

The three different pipelines not only break down the problem into all possible granularities but also shed light on the best starting point for this specific problem. When all pipeline results are taken as a whole, the final results are surprisingly consistent: no grouping at all has steady and far superior performances in comparison to all other grouping methods. The best performers in all three pipelines have SMAPE averages around 0.45, with cluster level Neural Network being the globally best performing model (SMAPE 0.43), individual level XGBoost being the second best performing model (SMAPE 0.46), and global level Neural Network follows (SMAPE 0.48). Model level differences vary for different markets/clusters, they could be as large as 0.2 (for example, SIN_SFO Random Forest 0.44 vs. Poisson 0.26) or as minute as unnoticeable.  

Image 5. Features Used in No Grouping

Image 5. Features Used in No Grouping 

Image 6. Clustering Model Performance  Image 6. Clustering Model Performance 

To consider the efficiency, adaptability, as well as tangibility of mounting the pipeline to production, the tradeoff between model accuracy and generalizability is one of the main things to factor into consideration. A one-model-fit-all situation has highest generalizability and easiness in implementation, yet the cost in lower accuracy could negate some of its benefits. On the other hand, if all markets were to have their best performing model and grouping combination, the actual implementation would be extremely costly and almost impossible. Our experiment seems to perfectly reflect such tradeoff – if each individual market were to have their own model and grouping combination, the SMAPEs could be as low as 0.2; if all markets share just one model, the SMAPE average would more than double. Consequently, our experiment underscores the possibility of clustering markets and deploying models in flexible and less costly manners. In fact, clustering markets not only balances the tradeoff but also has no negative impacts on model accuracy. Regardless of the final take on such tradeoff, the pipeline provides immense flexibility in determining the ideal granularity and generalization mechanisms therefore laying the groundwork for future research and production endeavors.  

One of the major findings from the experiment is the fact that grouping the data based on different time granularity might not be as beneficial as one would expect. Although the time-series nature of the problem naturally leads to data preprocessing steps such as grouping based on various time-dimension granularities, the experiment shows that it is best to leave the dataset as it is. One possible interpretation stems from the fact that hyperparameter tuning for all models was completed based on the dataset without any grouping (mainly for computational costs, as mentioned in the model pipeline section). When the same hyperparameters are used to fit new data structures, the model might fail to learn new underlying patterns at all, therefore leading to less promising results for all grouping variations. Nevertheless, the idea of restructuring the data based on time-dimension granularities is still a valid and relevant approach to consider in future research and studies. With enough computational resources to tune different models on different data, one could potentially gain new discoveries regarding best practices in terms of time-dimension granularity.  

 

PED Analysis  

Price Elasticity of Demand Formula Price Elasticity of Demand Formula 

 To understand the underlying trend in changes of PED, each market/cluster is being considered by controlling for the analysis scope (precovid), number of stops (0), and tktAl (high demand) while changing cabin (economy vs. business), original price (average price for each market), and new price (20% increase based on respective original price). For example, the average price of economy tickets from ATL to BOS is $200, thus the PED for economy cabin was calculated based on the original price of $200 and new price of $240. With the year of 2023 as the focus of the study, the monthly-level trend in PED evolution is being plotted for all 12 months of the book dates and their corresponding departure dates spanning 12 months. To be exact, the PED plot for January 2023 was created based on 12 possible departure months ranging from January 2023 to December 2023. A PED of 0.1 for book date and departure date of January 2023 means that businesses could expect a 0.1% increase in the number of passengers booking a flight in January 2023 that also departs in January 2023 if the price increases by 1%. The same approach and logic also apply to cluster level analysis. This approach creates a clear visualization of how PED for a certain book date will change depending on how far the departure date is into the future and keeps the price change constant so that PED ranges are directly comparable across markets/clusters.  

Regardless of market level granularity, most PED trends for the business cabin are positive, and most PED trends for the economy cabin are mixed. The difference between the cabins is expected as most business travelers are less responsive to price changes, and most leisure travelers do consider more factors such as high-season, vacation time window, and so on. However, the drastic differences in PED ranges are still worth noticing. For example, for SFO_SIN, the PED range for economy cabin (Fig. 1a) is around 0.07 while that for the business cabin (Fig. 1b) is as large as 6. This pattern can also be seen in cluster 1, cluster 2, JFK_CDG, and so on.  

Figure 1a. SFO_SIN Economy Cabin PED Trend  Figure 1a. SFO_SIN Economy Cabin PED Trend 

Figure 1b. SFO_SIN Business Cabin PED Trend

Figure 1b. SFO_SIN Business Cabin PED Trend 

 

Cluster Level PED 

All four clusters show interesting trends in PED variation. Cluster 1 and 2 will be analyzed in detail for their inclusion of diverse markets and representativeness of cluster-level differences. Detailed cluster information is shown below. 

Cluster 0 

LHR_SYD, SYD_LHR 

Cluster 1 

ATL_BOS, BOS_ATL 

DEN_DFW, DFW_DEN 

LAX_SEA, SEA_LAX 

CDG_JFK, JFK_CDG 

Cluster 2 

FRA_SFO 

SIN_SFO 

MEX_IAH 

GRU_MIA 

Cluster 3 

IAH_MEX 

MIA_GRU 

SFO_FRA 

SFO_SIN 

           Cluster Information 

The interesting fact about Cluster 1 worth mentioning, other than the fact that its PED range for business cabin is much larger than that for economy cabin, is that its PED is almost always positive regardless of cabin. A consistently positive PED indicates that travelers in Cluster 1 are insensitive to price changes, which is understandable in the case of these eight popular markets. (Fig. 2) 

Figure 2a. Cluster 1 Economy Cabin PED Trend   Figure 2a. Cluster 1 Economy Cabin PED Trend  

Figure 2b. Cluster 1 Business Cabin PED Trend 

Figure 2b. Cluster 1 Business Cabin PED Trend  

Cluster 2 contains international flights coming into the states. Travelers are least responsive to price changes later in the year regardless of actual departure month. (Fig. 3) The potential change in travelers’ mindset with the holiday seasons approaching might be one of the major reasons for this noticeable pattern. However, the mixed PED for economy cabin shows how markets in Cluster 2 are not as consistently popular as markets in Cluster 1. The absence of a peak demand season further signifies the absence of leisure travel, aligning with this common characteristic of the four destination cities within Cluster 2. 

Figure 3a. Cluster 2 Economy Cabin PED Trend

Figure 3a. Cluster 2 Economy Cabin PED Trend 

Figure 3b. Cluster 2 Business Cabin PED Trend  Figure 3b. Cluster 2 Business Cabin PED Trend 

 

The advantage of clustering markets, as shown in this case, is that it could preserve the common characteristics across different markets and summarize essential travelers’ behavior patterns on a cluster level. In more complex cases, for example if more markets are present, clustering would require more market-level information in order to fully capture all common characteristics (i.e. both local and global underlying patterns across markets) thereby producing clusters that are representative as well as holistic.  

 

Market Level PED 

On a market level, 2 markets (SFO_SIN, JFK_CDG) have obvious PED trends for both economy and business cabin while the rest have obvious trends for one of the two cabins.  

An interesting finding is that travelers’ booking behavior (i.e. responsiveness to price changes) for a specific departure month could have drastic changes throughout the year. This trend is especially noticeable for known popular tourism cities such as Sydney. The demand for flights to Sydney reaches its peak in October, with travelers being least responsive to price changes when they book in April, July, and September. Travelers’ booking behavior reflects the fact that tourism high-season in Sydney starts in October. (Fig. 4) Similar pattern can also be seen in SEA_LAX, where travelers are the most insensitive to price changes for flights in July and August when they book in June. (Fig. 5) 

Figure 4. LHR_SYD Economy Cabin PED Trend

Figure 4. LHR_SYD Economy Cabin PED Trend 

Figure 5. SEA_LAX Economy Cabin PED Trend  Figure 5. SEA_LAX Economy Cabin PED Trend  

Mexico City can also be considered as a popular tourism destination, with dry-season being the best time to visit. Travelers are least responsive to price changes for flights in January and February, the peak of the dry-season. Their booking behavior in this case, however, is different from that of LHR_SYD. Travelers are less responsive regardless of booking time. (Fig. 6) Both Sydney and Mexico City are renowned tourist destinations. The consistent positive PED for IAH_MEX, in contrast to the somewhat varied PED for LHR_SYD, can be attributed to the close proximity between the United States and Mexico, facilitating frequent and seamless business interactions, labor transportation, and cultural exchanges. 

Figure 6. IAH_MEX Economy Cabin PED Trend  Figure 6. IAH_MEX Economy Cabin PED Trend 

These findings about travelers’ responsiveness to price changes on an individual market level indicates the need for a more market-specific, case-by-case approach. For known popular seasons for different markets, travelers’ behavior could be inconsistent (i.e. less responsive in certain times of the year) or surprisingly consistent. These minute differences and delicate trends need to be coupled with more market research in order for businesses to accurately dissect and predict travelers’ behavior changes.  

 

Conclusion and Next Steps  

Our experiment dissects the problem of predicting price elasticity of demand by exploring all possible combinations of research scope granularity, market level granularity, and time-dimension granularity. With our exploratory study as a starting point, the research in price elasticity of demand in the travel industry would bring immense value to businesses. By quantifying the responsiveness of demand to price changes, businesses can better understand traveler behavior, optimize pricing strategies, as well as reevaluate market segmentation.  

The results of our study show that while all market level granularities reach similar model performance, the balance of accuracy and computational costs on a cluster level highlights a potential starting point for future research. Our research concludes that time-dimension granularity might not be a vital factor to consider in predicting price elasticity of demand, yet limitation in both data and computational resources might be the underlying cause to this pattern instead of actual irrelevance.  

One of the most important findings from analyzing PED trends is that the cabin has a significant impact on travelers’ responsiveness to price changes. Business cabin travelers are almost always insensitive to price changes while economy cabin travelers react differently based on factors such as holiday season, popular tourism season, and other market-specific trends. Another important discovery from our prediction of price elasticity is that for known popular seasons (tourism or holiday season), travelers may react differently to price changes throughout the year. Travelers could be less responsive to price changes in certain months, and such responsiveness follows different patterns for different markets. Businesses could benefit from researching more on the traveler behavior patterns and set corresponding pricing strategies.  

Building on the insights gained from this research, several potential next steps can be taken to further enhance our understanding and application of predicting price elasticity in the airline industry. First and foremost, understanding market-level differences and similarities in order to better cluster markets could be of vital importance to the problem. Our study focuses on market characteristics for the most part, yet factors requiring more domain knowledge such as competition levels could also contribute to constructing more holistic clusters. Furthermore, finer time-dimension granularity design, conforming domain knowledge and common practices could be explored in further details. While our study considers four of the most common time granularity combinations, other combinations might be worth considering. Similarly, external data included in our study is limited to traffic and economic indicators. Other data such as common promotion periods could be included to capture more factors that could affect traveler behavior. Moreover, the variety of data structures originating from different grouping mechanisms encourages more research on best practices in comparing model performance across different data structures. Other than taking averages of evaluation metrics, we proposed the use of the Kruskal-Wallis test suitable for evaluating ranks along with Dunn’s post-hoc test with Bonferroni correction.  

In conclusion, our comprehensive study on predicting price elasticity of demand in the travel industry highlights its immense value for businesses. By quantifying demand responsiveness and optimizing pricing strategies, businesses can better understand traveler behavior and market segmentation. Our findings suggest that clustering markets on a cluster level granularity demonstrates a favorable balance of accuracy and computational costs. Further research should explore market-level differences, finer time-dimension granularity, and the inclusion of additional external factors to enhance prediction accuracy and decision-making. Additionally, developing best practices for comparing model performance across different data structures is recommended.