Living in Clifton a few years back, I noticed a family of rats had formed a small community in the courtyard of my apartment building, near a dumpster. I loved Clifton, but I wasn’t sure the neighborhood was right for the rats (they just didn’t seem interested in what was playing at The Esquire), so when I got inside I shot an email to my property manager. The next afternoon, the rats were gone. The city of Cincinnati never knew about these rats (much less the family farm that I’m sure took them in). I didn’t file a report with city services, and I’m sure my property manager didn’t, either. So, in a sense – and I don’t mean to get philosophical here, but it’s about to happen so watch out – it’s like they didn’t exist.
For the past couple months I’ve been exploring whether the city of Cincinnati’s 311 data can be used to predict when and where the next infestation of rats, mice, or roaches will occur in the city’s 52 neighborhoods. (Cincinnati also has a severe bed bug problem, but I’m saving that issue for another day.) As you can imagine, it’s been tough slogging – partly, I think, due to stories like the one I described above (i.e. the case of the rats that never were), but mostly because it’s difficult to account for all of the factors that attract rats to one location but keep them far away from another. At the same time, I’ve made a number of interesting discoveries that could prove useful in preventing breakouts in Cincinnati.
For instance, the most reliable predictor of a future infestation-related 311 request is a previous infestation-related 311 request received within the same neighborhood sometime in the previous seven days. Given what we all know about rats – that the only thing that multiplies faster are breweries in OTR – this finding doesn’t come as much of a surprise, but it is nonetheless useful from a modeling perspective, and also suggests that the city should not treat infestations in isolation. If rats, mice, or roaches are reported in a house in Oakley, then the surrounding block should at the very least receive information about prevention.
Pairing the city’s 311 data with data from the American Community Survey has also allowed me to examine whether factors such as population density, per capita income, poverty rates, and access to resources such as quality schools and health care help explain why different neighborhoods in Cincinnati respond differently to the various triggers of infestation (they do), which not only increases the reliability of my model but also increases its usefulness. For instance, I’ve got pretty strong evidence that suggests the city should not simply respond to requests on a “first reported, first served” basis; instead, precedence should be given to whatever request carries with it the highest risk of causing more infestations, be it for reasons of geography, affluence, or prior infestations at that same address. This approach would not only direct the city’s resources toward the people who need them most, but would also allow the city to use those resources more efficiently since treating one breakout in a neighborhood would be less costly than treating multiple.
Finally, I should point out that I’m not the first to approach this problem using 311 data. Data Analysts in Chicago’s Department of Innovation and Technology (in partnership with Carnegie Mellon’s Event and Pattern Detection Laboratory) took a similar approach to modelling the Chicago’s rodent infestations, and in 2013 – the year the city put the DoIT’s results into use – requests for rodent control services dropped by 15%. The work I’ve done with Cincinnati’s 311 data suggests that wouldn’t be an unreasonable goal here.
Kevin Haynes has a Masters in Applied Economics from the University of Cincinnati. He works as a Data Analyst for a national education non-profit called TNTP and can be reached at email@example.com.
Watch and listen to Jeff present open source software to a local audience including members of his local government. Be like Jeff.
On one story, [Raquel Rutledge, investigative reporter for the Milwaukee Journal Sentinel] had a stack of 1,800 PDFs and wanted to tally one field of data from each record to determine the public burden imposed by a tax subsidy. Counting manually was a waste of time, but no one in government said they had the total.
“I just wanted this one number, this one field, but I wanted it for all 1,800,” Rutledge said. “Well, this 23-year-old dude comes out, writes a little code, and boom-boom-boom-bam: In a couple of hours we have a number that no one had ever seen before.
It was a $29 million program. A $29 million burden on the rest of the state. I requested it a million different ways from county people and no one had ever counted it. That is such a huge public service. Hiring that kid was just one of the best things you could do.”