Notes from Code For America’s Day of Civic Hacking
I spent this past Saturday, Sep 21, at the Github office in San Francisco, attending the “Day of Civic Hacking” hosted by Code For America. We started the day by listening to lightning talks from the following speakers, describing different aspects of housing in SF:
- Donna Hilliard, Code Tenderloin
- Jay Cheng, SF Chamber of Commerce
- Fernando Marti, Council of Community Housing Organizations
- Karen Chapple, Professor and Chair of City and Regional Planning, UC Berkeley
- Marion Wellington, TechEquity Collaborative
After hearing these talks, we were invited to learn more about DataSF’s Open Data, a project that provides easy access to public, government data. We were encouraged to explore the datasets and brainstorm different visualizations or questions which could be answered by the data.
The speakers had a wide variety of perspectives on the problem, representing non-profits, businesses, and academia. Despite their differences, there were some major points that they all agreed on. Some of these points were surprising to me, and others were not.
Things I already believed:
- SF rents are rising at a rate disproportionate to the median salary, forcing existing residents out of the city.
- SF is not building enough housing to meet the demand of people wanting to live here.
- The housing SF does build is largely aimed at higher income residents.
- New apartments go up in only a small part of SF, geographically. This is largely due to zoning laws in large swaths of the city restricting new property to single-family usage.
- This is probably not a problem that tech can solve on it’s own, it needs top-down leadership to implement comprehensive solutions.
What I was surprised to learn:
- About 50,000 units of housing are approved each year, yet only 5,000 are actually built. The cost of construction is high, including material and labor. Regulations also add to the cost. Unless these costs get cheaper, rents will have actually have to go up for most of these projects to be profitable.
- Lots of what is getting built and purchased has relatively low occupancy (investment/second homes).
- The partitioning of SF into 11 districts has made it easier to reject housing. The supervisor of each district has veto power over any housing project proposed in their district, and they hold their residents above all else. This is an unfortunate incentive, as the people who would benefit from these housing projects do not live in these districts yet.
- There is a chicken-and-egg problem when it comes to transit and housing. We want to build transit that connects existing housing, and we want to build housing near existing transit. This makes it difficult for areas not already close to transit to “catch up”.
After listening to the speakers talk, I felt hopeful. Despite their warnings that solving the housing problem was more of a central problem, they had many suggestions for how grassroots efforts, including technical projects, could also be of use.
- Data can help us understand the efficacy of different policies. Tech can be used to visualize this data to help inform politicians and voters.
- Tech can connect people to solve coordination problems. Karen Chapple brought up “Accessory Dwelling Units”, which can be both profitable for existing residents and a useful way to build more housing, but sometimes requires residents to share their properties. (Not sure I totally understood this)
- There are a lot of people in SF who do care about housing. Tech can help them make personal choices that align with their values.
My first idea was to build “Yelp for Landlords”. The idea would be to use open data to find landlords who had no-fault evicted tenants, and make it easy for conscientious renters to avoid rewarding bad behavior by choosing to rent other places. However, it turns out that while there is eviction data available, it is anonymized to the block-level to protect both the landlords and tenants.
Another participant I met there, Derrick Low, suggested looking through the 311 Cases. 311 is the primary customer service center for San Francisco. People call in to report on a variety of issues including abandoned bicycles, potholes, and broken streetlights. One category that can be reported is “Encampments”, which are defined as “tents, structures, or makeshift tarp shelters occupying the public right of way”. As I scanned over the data in this API, one thing that struck me was that users would often upload images of these encampments in their reports.
One of the speakers, Donna Hillard, had asked us not to forget about the human side of housing by focusing on the data. I thought one way to help us understand the housing crisis would be to share in a broad way what these (often) homeless encampments look like. Using the API provided for 311 Cases, I requested all 311 reports, of which there are close to 4 million since the start of data collection in July 1, 2008. I filtered down to only those pertaining to encampments, and further filtered to only those which contained some “media”, leaving closer to 100,000 reports.
How does one share 100,000 pictures in a way that’s not overwhelming? I thought a visual search engine would be a good choice. The difficult part of building a search engine for images is to find keywords associated with each image. I’ve had experience using the Google Cloud Vision API for another side project before, so I threw together a quick script to send 1000 labeling requests (the monthly free limit) to their endpoint.
Now that I had labels associated with each picture, it was pretty straightforward to build an inverse index. An inverse index is a map from label to the set of pictures that contained that label. Now a simple search engine can take a query and check it against the index, returning exactly the documents which match that query.
I’ve built this simple search engine and am hosting it at http://markmliu.com/encampments. One caveat is that this index is incredibly sparse since each image only has about ten labels. Contrast this with a web document, which usually contains thousands of words per document. As a a result, most queries do not have a hit. One simple fix for this could be to find “synonyms” for each label, and include hits for those synonyms when they are sent as a query.
It was a breath of fresh air to be surrounded by people who were passionate about civic duty. Some people were also there to network or learn new skills, but everyone who came seemed to really have a sense of wanting to help out in their community. It’s important to stay humble and know that we techies are not going to be able to come in and “disrupt the housing crisis”, but it’s a great feeling to try to apply my professional skills to a cause that I find meaningful.