Thursday, October 27, 2011

How to Use Google Prediction API to Estimate a Sale Price for Your Home

This blog post is based on Martin Omander's workshop at Silicon Valley Code Camp, which I attended on Saturday, October 8:
Store your data, predict the future

Speaker: Martin Omander
Level: Intermediate | Room: 3525 | 9:45 AM Saturday
  • Make your smart applications smarter with Google's Prediction API. Take advantage of Google's machine learning algorithms to make recommendations, analyze Twitter, detect spam, classify documents, identify languages and more.
  • Store your applications' data securely and efficiently in Google's data centers with the Storage API.
The real estate home price data file mentioned below was compiled by Martin from Redfin data for the Rex Manor neighborhood of Mountain View, California.

If you've ever wanted to buy a house or sell your home, and would like a reasonable estimate for what the ultimate sale price will be in this market, you can get a real estate agent's take, or you can use Google Prediction API (henceforth Prediction) to give you another estimated price, given available market data. The idea is, you grab a number of sale price records for the ZIP code you want to buy or sell in, feed the data into Prediction, and Prediction will spit out a number based on the data you fed it. Besides spitting out scalar values, Prediction also does classification. To get your feet wet, here is a beginner tutorial that figures out, in an automated way, whether some text you pass in is English, French or Spanish:

Assuming you were able to successfully use Google's Prediction API, in conjunction with Google Storage and Google's APIs Explorer Tool, to classify samples of text that you pass in, the next step is to predict a sale price for your home.
  1. Download Martin's houses.csv data here
  2. Using the same steps in the tutorial above, upload the houses.csv data into Google Storage, and then use the data to train Prediction
  3. After training is complete, we want to get a predicted price. To do this, we want to pass in the values for a house that we want to buy or sell. Let's say we're trying to sell a 3 bedroom, 3 bathroom, 1800 square foot, built in 1960, single family residence. I found that when I tried to pass in the following as a value for the csvInstance key, I got an error:
    3 3 1800 1960 house
    I then tried the same thing with commas, but that didn't work either. What I found finally worked is, in the APIs Explorer tool, when specifying the value of the csvInstance, you need to count how many field values you are putting in, let's call it n (in this case, n equals 5), then provision that same number by clicking on "Add" n times, and manually insert the values in order, like so:
    "input": {
    "csvInstance": [
Caveat for those who want to read in or import additional data sets: if you want Prediction to spit out a number, i.e., a scalar value, one thing to remember is that the column associated with the scalar value that you want a prediction on (in this case, home sale price) should be the first column inside the data file you feed into Prediction. Martin said that when he was putting together the home sale price data, he had to manually copy and paste from Redfin into his spreadsheet program, then move the column for the home sale price to be the first column, and then exported the spreadsheet as a csv, i.e., comma separated values, file.


  1. I am trying to work on posting the training data directly from an active Google Doc spreadsheet to Prediction API using Google App Scripts and then retrieving predictions also directly in the spreadsheet. App Scripts lets you create custom spreadsheet functions. :)

  2. Hey Thuon Chen,

    Thanks for nice posting on Google Prediction API, I have tried this successfully,but I have doubt on, How can I integrate this API to My Java Based Web Application.

    THanks I