Saturday, July 11, 2009

Freebase Hack Day

today was my third time in the offices of Metaweb, which operates freebase. 80 of us met on the 4th floor of a building off of Hawthorne Street in downtown San Francisco to learn how to interact with freebase, a database of structured data which currently has 6 million topics (a topic is a thing in Freebase). Besides being a database, Freebase is also an API and development platform. In other words, you can use freebase as a source of information for your software, say, Microsoft Excel or Google Spreadsheet or MySQL database, but you can also issue commands to freebase and it will behave in a predictable way, and you can use freebase to create new kinds of software. Any given topic in freebase has one or more types assigned to it, e.g., Queen Latifah, when considered as a topic on freebase, has at least 3 types assigned to her: person, musical artist, film actor. A type in freebase has one or more properties, e.g., the musical artist type has at least 3 properties: genre, instruments played, music recorded.



Thuon Chen with Kirrily Robert, Freebase Community Director


When one lands on freebase, there is so much to take in it may seem overwhelming at first, and it took some time for me to wrap my head around it. When freebase finally made sense was when I spoke with a Metaweb employee, Alex Botero-Lowry, about our mutual interest. At the beginning of Hack Day, Alex announced in front of the group he was working on television data, specifically liberating the extraction of said data. My curiosity piqued, I approached Alex and said I believe we are living in the Golden Age of television. One of my favorite writers, Tim Goodman, expounds beautifully on the sheer number of high-quality, well written and expertly produced recent television programs in a very compelling piece he put together containing lists of exceptional series sorted into categories:

Reference: http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2007/12/30/DDDGU66SJ.DTL

So if we wanted, we could ask Freebase to give us the list of episodes for a given tv program, sorted primarily by season number, secondarily by episode number:

  1. go to http://www.freebase.com/app/queryeditor

  2. now we want to query the vast resources of freebase (imagine yourself face-to-face with a large machine with blinking lights). The most important things to know at this point are:
    1. the position of the blinking cursor within the query data structure, i.e., [{ }]
    2. the 'Tab' key
    If we wanted all the episodes for a given tv program, we would simply type in:
    [{
    "type": "/tv/tv_program",
    "name": "The Wire",
    "episodes" : [{}]
    }]
    If you click within the data structure for episode, [{ }], i.e., click in the area between the curly braces inside the square brackets, and press the 'Tab' key, you will get a set of properties for 'episodes', and you can find out such information as the person who was credited as the writer or director.
    freebase query editor

Just to give you a sense of how there's more than one way to get the same or a similar result in freebase, the following are two paths to our destination: for a given tv program, get all the episodes, in ascending order, categorized by season, in ascending order. Nick and Jason at Metaweb helped me formulate my first query, and the latter query is courtesy of Alex Botero-Lowry:
[{
"type": "/tv/tv_program",
"name": "the sopranos",
"episodes": [{
"episode_number": null,
"season_number": null,
"sort": ["season_number", "episode_number"]
}]
}]
or:
[{
"name": "30 rock",
"type": "/tv/tv_program",
"seasons": [{
"id": null,
"name": null,
"season_number": null,
"sort": "season_number",
"episodes": [{
"name": null,
"id": null,
"episode_number": null,
"sort": "episode_number"
}]
}]
}]

No comments:

Post a Comment