Harnessing Crowd-Sourced Business Reviews To Inform Water Policy
Goal: Develop an interactive data analytics platform to integrate publicly available Yelp, Google, and other business directories with a Southern California water utility’s commercial billing data.
As water scarcity and resiliency is becoming an almost permanent challenge in parts of the US, it is becoming more important for water utilities to better understand their customers’ water use patterns and harness other data sources to complement their knowledge. Little is known about the drivers of water use by restaurants, hotels and other non-residential clients, although they account for a large and growing share of urban water use. Publicly available business directories from Yelp and Google can give utilities more information about the businesses they sell water to: In what sectors do these businesses operate? What is their popularity? What is their price range and customer exposure?
Answering these questions helps an urban water utility by revealing how business clients and their customers respond to water pricing changes or conservation efforts. It can also help them benchmark these businesses against each other to foresee what is possible and design smarter and well-informed demand management strategies. However, integrating water billing data with Google or Yelp directories requires a robust address matching algorithm that can reliably parse the business name and address strings from each data source. Matching Yelp or Google data to commercial water customers is just a first step. Integrated data must be summarized through a flexible and engaging analytics platform that distills useful information from business directories.
Tools that might be useful for this project include:
- Interactive visualization tools like Shiny, Dash, Bokeh, etc
- String matching packages like stringdist and jellyfish in R and Python, respectively
- Object oriented programming and good sense!