How are transportation, big data, and crime tied together?

Uber is an “on-demand car service that allows everyone to have a private driver experience,” allowing customers to put in requests for pickup online or through their smartphones, and now available in nine US cities and Toronto.  Because they are so integrated with communications technology, they have lots of fun data to play with, and they publish a blog of some of the more interesting tidbits, much like OkCupid used to.  In the article that I looked at today, “Uberdata: How prostitution and alcohol make Uber better,” Uber’s data team wrote about how examining big data to improve the efficiency of their transportation service company led them to a better understanding of crime rates in San Francisco.

Uber’s team was interested in learning exactly when and where people are most likely to want rides, so that they could place their vehicles nearby and reduce the time between ordering a ride and pickup.  They used data available from Zillow to subdivide San Francisco (as an example city) into many different neighborhoods with complex borders, then they asked how many trips they tended to do in each neighborhood.  Their current trip data was a good start, but they wanted to improve their predictions, so they looked at census data too, to see where people were likely to reside (the Social Explorer database is a good place to see census data mapped in cool ways, if you are interested).

However, the Uber data team realized that the census data would only show where people lived, not where they went to work or school or out to socialize.  They knew that they could start adding together data about where schools and businesses and bars were located to try to find this data, but they thought there might be an easier way: a proxy measure.  So they looked at San Francisco’s crime statistics (available at San Francisco Crimespotting) as a possible proxy for the population.  The article shows maps of both the neighborhoods where Uber trips happen and the neighborhoods’ crime rates, and the two match pretty well.  The Uber team was able to determine that there were higher populations in specific neighborhoods by reasoning that increased crime indicated increased population in those neighborhoods.

But the Uber team wanted to dig deeper – they wanted to know if specific crimes were tied to rides.  By comparing the data for lots of different crimes, they learned that “prostitution, alcohol, theft, and burglary” are high in areas with high numbers of Uber rides, while murder, arson, and vehicle theft were not statistically significantly tied to the number of Uber rides.  Thus Uber can take data from the census, crime statistics, and their own business, to generate greater revenue because they know where is most effective to send their cars.

Additionally, the Uber data team found out one interesting bit of information that did not directly tie in to their profit margin while doing this research.  They discovered that prostitution, specifically, rises greatly on Wednesday nights in San Francisco (and Oakland).  Looking into the reasons for this, they were able to discover that “Social Security and welfare checks arrive on the second, third, and fourth Wednesdays of each month,” and guess that the influx of cash into the city likely contributed to the rise in prostitution rates (especially after they went back to the data and determined that significantly more prostitution happened on the 2nd Wednesday of the month than on the 1st).  So business researchers trying to improve their company’s efficiency were able to learn about one of the indirect causes of street crime as a side effect of doing their data analysis.

Leave a Comment