How DataSift Delivers Twitter Firehose Stream in a Teacup
Wednesday, August 25, 2010 at 10:13AM
Robert Scoble (@Scobleizer) posted a YouTube video this morning of his interview with Nick Halstead (@NickHalstead) of DataSift. For those of you who prefer to consume the written word, here is my 3 minute recap of the 40 minute video.
Many of us routinely search the Twitter stream using keywords and hashtags, but that has its limitations. Some tweets omit hastags, keywords are misspelled in tweets, and searches often turn up irrelevant results and tweets by people we don't want to hear from.
Enter DataSift's platform which takes the entire Twitter stream and applies smart rules to filter out the most relevant results. It makes excellent use of the additional information tagged with a tweet beyond simply the content of a tweet. That information includes about 20 tags like location, bio, number of followers, source (web or other platform) among others.
A DataSift user can build custom rules that will deliver search results filtered not just by content, but also by context. What does that mean? Here are specific applications discussed in the video:
- A filter that delivers a stream of tweets that occur from a specific location; like all tweets from people inside a particular location of Best Buy, or within one block of the store.
- A filter that delivers tweets from all the players, owners, coaches and managers of a particular sports franchise.
- A filter that delivers top content from news sources that have been retweeted at least 100 times.
Another key attribute that makes this so powerful, is the ability to sift content based on the relative influence of the twitter user. One can use rules that return only tweets from users that have at least 50 followers or a minimum Klout rating of 75 (meaning my tweets wouldn't show up; I'm not Klouty yet) to further boost the relevance of results.
Substracting results is also possible, and DataSift has built in an option to omit results that contain any profanity for example. They can also track sentiment of content.
Rules are public, much like Twitter Lists are, so they can be reused, appended or merged with other rules and searched on by keywords and tags.
DataSift plans to launch the Alpha version by the end of August 2010, with plans for full scale live reporting November 2010. The free version will have in-stream advertising, but most of the revenue will be derived by selling filtered streams to brands.
How is any of this relevant to the small businesses that I consult with? Well, if I ran a computer repair shop, I would definitely be listening to people tweeting within my geographical range about viruses, crashes (substracting the word car), or hating their slow computer and be there to offer assistance.
If I had a restaurant, I'd be listening to tweets about cravings for the particular cuisine I offered, negative experiences in other local establishments or people looking for dining options and I'd be prepared to offer them some sort of special to come try my restaurant.
And just maybe this might be a way for me to figure out a way to solve a dilemma faced by canine rescue groups (like my favorite: Second Chance Boxer Rescue) across the country: coordinating rescue transports.
When a dog is surrendered or is sprung from a shelter, they often need to be transported to the foster home who will be caring for them. An appeal to the volunteer base is usually broadcast and if it's a long transport, it is usually broken up into 50 mile segments and hopefully filled by a bunch of volunteers.
I've often wondered how to optimize this by harnassing the power of Twitter. With all the vehicle traffic out there, how could we find the people already planning to travel from Massachusetts to Maine, for example, who wouldn't mind a boxer along for the ride? I think Twitter is the right set of ears, and DataSift might be the right filter!
Do you have any ideas on how to make this work? I'd love to hear them!
DataSift,
Robert Scoble,
Twitter 

Reader Comments