In Blog

# Did Uber Decrease DUIs? Depends on how you ask

Posted by: on Jul 29, 2014 | 2 Comments

### Intro

A few weeks ago I came across a blog post in which Uber reported that it decreased DUIs in Seattle by about 10%.That’s pretty awesome if there’s good evidence for it. There’s not a huge amount of detail on methods in the blog post, but they used a regression discontinuity design (RDD) to compare DUI rates before and after Uber came into town. They also used a Difference-in-Differences (DinD) approach (with San Francisco as the comparison city) to back up their findings.

Intrigued, I found a post from Austin Clemens, who got a hold of the Uber data and did a replication of his own.His post has more detail on Uber’s methods and his replication focused on the question of bandwidth- what happens when the window we’re looking into (in terms of time) is large? What about when it’s small? Importantly, he shows that bandwidth size matters. In other words, the story could have easily been that Uber has no effect on DUIs if they had used a different bandwidth in their models.

I took a look at the data (thanks, Austin!) and did a replication, too. If I were Uber, I’d be really interested in looking at whether or not Uber is associated with a decrease in DUIs during the times that DUIs are more likely: the weekends. Uber’s model doesn’t allow us to ask that question, but we can easily do so with the replication data!

### Getting started

After loading the data, I ran the replication (my coefficient on Uber was -0.677, close to Uber’s reported coefficient (-0.707), which is the estimated decrease in DUIs when Uber came to town).

You’ll notice factor variables for days of the week- these are dummy variables (“dayofweek.fMon”“, etc) that represent the impact of each day of the week on DUIs, compared to a common baseline. Uber noted on their blog that the baseline/ comparison category is Sunday, which is interesting, because if we look at a plot of incidents through the week, Sunday has the highest. Thus all comparisons to Sunday may inflate Uber’s influence.

Here’s the bar plot of DUIs over the week- note the peak on Sunday:

uber$dayn= factor(dayofweek, levels=c("Mon","Tue", "Wed","Thu","Fri", "Sat", "Sun"), ordered=TRUE) plot(uber$dayn, incidents, main="Mean DUIs Over the Course of the Week")

Let’s see what happens when we change how we think about time. Uber’s model allows us to ask: how did DUIs on Monday compare to Sunday? How about Tuesday compared to Sunday, what about Wednesday compared to Sunday, etc. Then, the coefficient on Uber tells us the conditional impact of Uber on DUIs in Seattle, after taking those daily comparisons into account (as well as a time factor and marijuana legalization.)

Instead of comparing every day to Sunday and putting each of those in the model, let’s make time periods that matter: weekends (Fri, Sat, Sun), long weekends (weekends+ Thursday), and workweeks (Mon-Thurs):

uber$weekend= ifelse (uber$dayofweek=="Fri"|uber$dayofweek=="Sat"|uber$dayofweek=="Sun",1,0)
uber$thursplweekend= ifelse (uber$dayofweek=="Thu"|uber$dayofweek=="Fri"|uber$dayofweek=="Sat"|uber$dayofweek=="Sun",1,0) uber$sunthruwed= ifelse (uber$dayofweek=="Mon"|uber$dayofweek=="Tue"|uber$dayofweek=="Wed"|uber$dayofweek=="Thurs",1,0)

### The strategy

We made time periods that matter because they allow us to answer a question that arguably matters:

Is Uber associated with fewer DUIs during time periods when we’d expect more DUIs?

To do this we subset the data by the presence of Uber, run our model, and compare the coefficients. Instead of using factored days, we use the time periods of interest to us. Unlike RDD, this decision requires no assumptions about bandwidth/windows (but it does make other assumptions about omitted variables.)

Here’s what we get when we plot the coefficient for each time period and subset:

### What does the plot above tell us?

DUIs are higher during Weekends and Long Weekends (compared to non-(long) weekends).This shouldn’t surprise us that much. But this is the case both when Uber is in town and not! In fact, DUIs appear to increase during both Weekends and Long Weekends when Uber comes to town! The only decrease in DUIs after Uber’s entry appears to be during the workweek.

To see if these differences “mattered” statistically (vs. just appearing to matter), I re-sampled via bootstrapping, took the difference of the bootstrapped betas, and calculated the 95% confidence interval for those differences. This is a case where Uber is already in the doghouse for the positively signed coefficients during the weekends (per the bar graph), so they’d probably hope that the differences between the subsets do not matter.

### What’s the takeaway?

Turns out the differences do matter (notice how none of the estimates cross the 0 on the plot above!). In other words, Uber has made a difference- that difference appears to be an increase during the weekends (bad!), but the decrease during the workweek (very good!) shouldn’t be ignored.

If you think that causality is as simple as someone telling you they’ve identified a statistically significant effect (which it it not), Uber has some explaining to do.

There is likely all kinds of endogeneity lurking in our model that should be controlled for. But this is an interesting starting, or at least discussion, point.

My hunch is that the increase during the weekends and decrease during the week has something to do with Uber availability. During the week, demand is relatively lower and the wait for an Uber is quite short, so it’s easy to hop in an Uber and leave the car at the bar. During the weekend, demand is high, prices are higher, and the wait is longer. Maybe Uber increases DUIs because when drunks see the amount of wait time, and get frustrated and drive away, whereas waiting for cab was always a surprise.

I am sure Uber has data on cancelled rides; I’d love to see it to test this hypothesis.

(See my github for code)