Prior to coming to Microsoft, I was VP of Development for a company called Gazelle. One of the challenges we had there, was the need to standardize the names of items in a restaurant environment such that they could be warehoused and compared across chains.
The problem that we found – and the problem I see getting out of hand with blogs today – is there is no common set of definitions that people utilize or contribute to. In a restaurant example, you have no UPC codes for say a margarita. The name is essentially what exists on the POS button. This name can change from physical location to physical location within a chain of restaurants, and also variants between chains. There are no hard rules on what name is defined on the POS key, other than name length. As a result, you ended up with “Marg”, “Rita”, “MRITA”, “Mrgrta”, “Mrgarita”, “Margarita”, etc. An elaborate set of filters needed to be developed, run through, tweaked, and run through again, with ultimately a human with domain expertise mapping the unknown items to an existing or known definition.
This is the same thing we’re seeing in blogs, only there are far more blogs than restaurants, and blogs cover far more subject matter domains than restaurants. Eventually, we will want to do what I did for restaurants at Gazelle – standardize data, provide common tags by which to identify, find, and share information.
I think there is a distinct opportunity for someone to stand up and show some leadership here and build a common repository of tags and allow people to contribute to them. This would need to also include a thesaurus of similar tags.
In addition to categorizations you might expect “Technology”, “Windows Communication Foundation”, etc. we can let people identify contexts in which that definition applies – be it verticals (retail, hospitality, financial services, entertainment, etc.), demographics (geography, language, age bracket, gender, marital status, etc.) This provides additional context which can help us with relevancy determinations in future.
If we ignore these categorization issues, I think we’re missing out on an easy opportunity to provide leadership in the web 2.0 space, and make it easier for both bloggers to get visibility as well as our search engines (and related advertising services) more information to bring results and ads back to customers.
At Gazelle, we approached the problem by designing a system that broke apart the word(s) and did pattern matching, auto-mapping what it could and going to an individual when it was questionable. The challenge there was it required someone who had domain specific knowledge and was not easily outsourced. In this scenario, we found that the team in India had a hard time mapping back some of the items purely from lack of exposure of some of the brands. When you look at tags, you have something exponentially larger because they cover any subject.
Remember Me