Tag Archives: data analytics

Unexpected Data Bias in Smartphone Trace Data

This study, a joint study with Professor Amanda Stathopoulos‘ group, explores the impact of shifting device representation bias in smartphone tracking data collected before and after Apple’s 2021 privacy updates on user location tracking. It demonstrates that privacy regulations can significantly and unexpectedly affect the quality of these data, which are crucial for decision making across governmental, corporate, and academic institutions worldwide. The research also corrects misconceptions about representation bias previously speculated in the literature. Overall, the findings equip users of location-based device data with a better understanding of potential pitfalls, enabling them to anticipate the changes caused by the evolving regulatory landscape and to devise appropriate coping strategies. This finding is contrary to popular concerns about the under-representation of low-income populations in LBS data.

Download the preprint here and read the abstract below:


As smartphones become ubiquitous, practitioners look to the data generated by location-tracking services enabled on these devices as a comprehensive, yet low-cost means of studying people’s daily activities. It is now widely accepted that smartphone data traces can serve as a powerful analytical tool for research and policymaking. As the use of these data grows, though, so too do concerns regarding the privacy regulations surrounding location tracking of private citizens. Here, we examine how Apple’s tightened privacy measures, designed to restrict location-tracking on their devices, affect the quality of passively generated trace data. Using a large sample of such data collected in the Chicago metro area, we discover a significant drop in iOS data availability post-privacy regulations. The results also reveal a surprising puzzle: the reduced tracking is not uniform and contradicts customary concerns about the under-representation bias of low-income population. Instead, we find a negative correlation between device representation level and income, as well as population density. These findings reframe the debate over the increasing reliance on smartphone data, highlighting the need to understand evolving issues in tracking, coverage, and representation, which are essential for the validity of research and planning.

The sustainability appeal of URT

Few would deny that public transit has an important role to play in any sensible solutions to the transportation’s sustainability problem. Yet, the consensus often dissolves at the question of how. A case in point concerns urban rail transit (URT), which has expanded rapidly in recent decades.   The ongoing debate about URT has been fueled by inconclusive, sometimes contradictory, empirical evidence reported in the literature.  Has URT consistently reduced driving and/or auto ownership to affirm its appeal to sustainability? We set out to address this question head-on in this study.

You may read the abstract below, and download a preprint here.


Abstract: Urban rail transit (URT) has expanded rapidly since the dawn of the century. While the high cost of building and operating URT systems is increasingly justified by their presumed contribution to sustainability — by stimulating transit-oriented development, promoting the use of public transportation, and alleviating traffic congestion — the validity of these claims remains the subject of heated debates. Here we examine the impact of URT on auto ownership, traffic congestion, and bus usage and service, by applying fixed-effects panel regression to time series data sets compiled for major urban areas in China and the US. We find that URT development is strongly and negatively correlated with auto ownership in both countries. This URT effect has an absolute size (as measured by elasticity) in China three times that in the US, but is much larger in the US than in China, relative to other factors such as income and unemployment rate. Importantly, the benefit transpires only after a URT system reaches the tipping point that unleashes the network effect.  Where this condition is met, we estimate about 14,012 and 31,844 metric tons of greenhouse gas emissions can be eliminated each year in China and the US, respectively, for each additional million URT vehicle kilometers traveled. We also uncover convincing evidence of cannibalization by URT of bus market share in both countries. However, rather than undermining bus services, developing URT strongly stimulates their growth and adaptation. Finally, no conclusive evidence is found that confirms a significant association between URT and traffic congestion. While traffic conditions may respond positively to URT development in some cases, the relief is likely short-lived.

Fall and rise of taxi travel during COVID

This is our second COVID19 related study, completed in 2020 and published in Transportation Research Part A in 2021. You may read  the other one here, which is about optimally adapting transit design and operations in a pandemic.

We examined taxi trajectory data collected in four weeks that cover the onset of COVID19, the shutdown, and phased reopening in the city of Shenzhen. Our analysis revealed how the pandemic and the travel restriction policies affected both the supply and the demand of the taxi market in the city.   One of the more interesting findings is that the city’s stimulus policy, designed to boost taxi supply and help taxi drivers, might have led to oversupply, by inducing taxi drivers to spend more time on the road than what the prevailing market condition would justify.  We uncovered direct evidence from data to support this finding through a clustering analysis.

A preprint is available here.


Abstract: This paper traces the plunge and rebound of the taxi market in Shenzhen, China through the COVID-19 lockdown. A four-week taxi GPS trajectory data set is collected in the first quarter of 2020, which covers the period of lockdown and phased reopening in the city. We conduct a spatiotemporal analysis of taxi demand using the data, and then select taxis that continued to operate through the analysis period to examine whether and how they adjusted operational strategies. We find, among other things: (i) the taxi demand in Shenzhen shrank more than 85% in the lockdown phase and barely recovered from that bottom even after the city began to reopen; (ii) the recovery of taxi travel fell far behind that of the overall vehicle travel in the city; (iii) most taxis significantly cut back work hours in response to the lockdown, and many adjusted work schedule to focus on serving peak-time demand after it was lifted; (iv) taxi drivers demonstrate distinct behavioral adaptations to the pandemic that can be identified by a clustering analysis; and (v) while the level of taxi service dropped precipitately at the beginning, it quickly rebounded to exceed the pre-pandemic level, thanks to the government’s incentive policy. These empirical findings suggest (i) incentives aiming at boosting supply should more precisely target where the boost is most needed; (ii) the taxi market conditions should be closely monitored to support and adjust policies; and (iii) when the demand is severely depressed by lockdown orders or when the market is oversupplied, taxi drivers should be encouraged and aided to use more centralized dispatching modes.

How can the taxi industry survive the tide of ridesourcing?

This  paper makes two empirical findings and one prediction. First, it reveals the intensity and scope of the impact of ridesourcing on the conventional taxi industry. Second, it uncovers evidence that taxis may be competitive in densely populated areas.  The second finding leads to a follow-up study you can read here.

I predict that the taxi industry is here to stay in the foreseeable future.    Here is what I wrote in the conclusion:

“Beyond e-hailing, economy of scale and aggressive pricing, ridesourcing does not seem to have other means at present to drive its expansion in the market. E-hailing is no longer the secret weapon that once glorifies the cause of TNCs – it can be easily picked up by a taxi dispatcher that owns and operates its own fleet. Aggressive pricing, on the other hand, has proven at best a double-edged sword, as Uber’s recent bitter defeat in China has vividly demonstrated. The scale of TNCs, which gives outside visitors a brand to stick to, is indeed an important competitive advantage. Even this lead is not that difficult to catch up, however, if a mobile platform, presumably operated by a third party, can unify taxi dispatchers around the world. Such a platform can easily work within cities’ existing regulatory structure, rather than against it, because it utilizes a dedicated and existing fleet. It can also improve the experience of street-hailing, a decisive advantage it holds against ridesourcing, by offering customers the amenities considered only available to e-hailing users, such as paying the fare on-line and rating drivers, all in real-time. An obvious solution may be allowing customers, as they board the taxi hailed off street, to open up an electronic transaction session similar to those seen on e-haling platforms, by e.g. scanning a QR code attached to the taxis or the driver’s smart phone.”… therefore, “The revolution of ridesourcing is unlikely to eliminate the necessity of a dedicated service fleet, and for years to come we will continue to live in a world with both ridesourcing and (upgraded) taxis.”.

The Journal of Transportation Research Part C selected this paper to receive the Best Paper Award in 2018. You may download a preprint here.


Abstract:  This paper aims to examine the impact of ridesourcing on the taxi industry and explore where, when and how taxis can compete more effectively. To this end a large taxi GPS trajectory data set collected in Shenzhen, China is mined and more than 2,700 taxis (or about 18% of all registered in the city) are tracked in a period of three years, from January 2013 to November 2015, when both e-hailing and ridesourcing were rapidly spreading in the city. The long sequence of GPS data points is first broken into separate “trips”, each corresponding to a unique passenger state, an origin/destination zone, and a starting/ending time. By examining the trip statistics, we found that: (1) the taxi industry in Shenzhen has experienced a significant loss in its ridership that can be indisputably credited to the competition from ridesourcing. Yet, the evidence is also strong that the shock was relatively short and that the loss of the taxi industry had begun to stabilize since the second half of 2015; (2) taxis are found to compete more effectively with ridesourcing in peak period (6-10 AM, 5-8 PM) and in areas with high population density. (3) e-hailing helps lift the capacity utilization rate of taxis. Yet, the gains are generally modest except for the off-peak period, and excessive competition can lead to severely under-utilized capacities; and (4) ridesourcing worsens congestion for taxis in the city, but the impact was relatively mild. We conclude that a dedicated service fleet with exclusive street-hailing access will continue to co-exist with ridesourcing and that regulations are needed to ensure this market operate properly.