Tuesday, September 29, 2009

Open letter to location based services



After a hiatus from blogging, I wanted to announce a new side project I have been working on with some of my colleagues at buzzd.com. The side project is called HashCeratops.org, which I will explain after a brief explanation of how the project came about:

Genesis:
At buzzd, we have been feverishly working at creating a compelling platform built around finding the very coolest places & events around you. Our newest application, on the iPhone, is really great at doing just that, and I am very proud of what the buzzd team has put together. Building this product made us really confront an implicit problem that a lot of location based services (LBS) know about but haven’t gotten around to addressing quite yet.

The problem:
This problem is that organizing the plethora of venue “data” created by all these varying services is a messy job. The industry knows that a high degree of convergence is inevitable, but not easy. All of our individual services can be made better if we can compare public buzzes, tweets, check-ins, or whatever else you want to call it – at the same place. Take it from me - we aggregate over 10 content providers and then merge, clean, and de-duplicate venues - its a royal pain. The difficulty is in identifying what the ‘same place’ really is on different services and content providers. We are in desperate need of some better standards – essentially, synchronization tools.

Industry best practices typically involve matching venues based on a phone number. There are other solutions, but this is the most typical. Briefly, let me explain why this is ‘messy’: Phone numbers change, venues get new names but don’t change phone numbers, venues have multiple phone numbers, or even multiple venues have the same phone number (see Webster Hall vs. The Studio). Similar problems arise with other methodologies. Even geotagged data is problematic: with all the different technologies: different geocoder databases, GPS, CellID, etc. – a latitude/longitude of a business or an end user isn’t as precise as you think - you aren’t really looking apples to apples.

So what we need is a better way to match a place on two different services. And creating a random string of numbers (Webster Hall = 8018765) isn’t gonna cut it. The identifier needs to be simple, something that makes sense to both databases and end users. To put it another way - it needs to be tweetable. There are tons of challenges in creating a great set of identifiers that meet these standards, and you would expect, the best solution cannot really be machine generated – it is, in the words of a colleague, a ‘heuristic problem’. The solution needs to be human.

The project:
What we have done with the HashCeratops project is to build a prototype of this solution. We have manually looked up thousands of popular venues, and picked out our very best shot at a great unique identifier. Our methodology insures the identifiers are unique, clean, readable, and as short as possible. We did this for thousands of venues. Let me tell you from personal experience – it is a painstaking ordeal, but a valuable undertaking nonetheless.

The result is a first start – a database of thousands of places with unique identifiers – and a public, open, community resource where anyone can contribute to the project and access the work that has already been accomplished.

We have made this wiki publicly available: we gave it a silly name and logo, created a website and built an API. Now we are asking the community to get involved. Use it, and contribute to it. Buzzd has given us the APIs to get the project off the ground, and is already using our database of identifiers to create twitter hashtags to associate tweets with venues. We have already gotten a few partners on board and look to make an announcement about it soon.

This is a resource for everybody, but it is only a start. We need help - we cant do this alone. I am excited to see how the project evolves.

0 comments: