(This was completely done in the Google DevFest Buffalo 2015 (24 hours), starting from the first light to the deployment)
(The language, originally intended for a project description, may seem a little odd)
We tried to understand the sentiments and various topics in reviews for Yelp businesses. We proposed that this data, a feedback rating in various facets of business, will greatly help the them to understand customer response.
We provide simple ratings, 0 to 5 star for Restaurants in four categories: 'Food', 'Service', 'Value for money' and 'Ambience'. The dataset is made publicly available by Yelp as their DataSet Challange (http://www.yelp.com/dataset_challenge).
How we did it? - Natural Language Processing
We extracted each sentence from each review and categorize it in one of the categories by doing semantic similarity based on WordNet synsets. We then computed sentiment polarity of the sentence. Ending up with (Category, Sentiment Polarity) pairs for each sentence in the review, we aggregated (each sentence, each review) to give overall category ratings.
We also tried to be more precise, extracting phrases in a sentence. For instance - the sentence: "The pizza is was really awesome, but had to wait a lot.", talks about two categories: 'Food' and 'Service' with opposing sentiments). We used Stanford Parser for this extraction, but dropped the idea because of the computation time. Various optimizations were done to make the system efficient.
We had, what seemed to be an inexhaustible repo of ideas for increasing accuracy and improving results: using the review rating provided by yelp, reviewer profile information, different similarity measures, trying out topic modeling (LDA), Supervised learning... Due to time-constraints of a hackathon we could not explore everything, but this is an ongoing project, with quite a many applications such as summarization and stuff.
On this advent of my first technical post, I would like to thank my god, family and friends and their neighbours. And only for the sake of it, should also consider to mention, though not in a completely feeble, fleeting and frivolous manner, my team for this project: Himanshu, Ankit and Harishankar Vishwanathan.
Results: http://avinav.science:4000/
Git: https://github.com/avinav/Yelp_Review_Categorization
Some related papers and articles:
WordNet::Similarity - Measuring the Relatedness of Concepts
Wordnet based semantic similarity measurement
Sentence Similarity Based on Semantic Nets and Corpus Statistics
(The language, originally intended for a project description, may seem a little odd)
We tried to understand the sentiments and various topics in reviews for Yelp businesses. We proposed that this data, a feedback rating in various facets of business, will greatly help the them to understand customer response.
We provide simple ratings, 0 to 5 star for Restaurants in four categories: 'Food', 'Service', 'Value for money' and 'Ambience'. The dataset is made publicly available by Yelp as their DataSet Challange (http://www.yelp.com/dataset_challenge).
How we did it? - Natural Language Processing
We extracted each sentence from each review and categorize it in one of the categories by doing semantic similarity based on WordNet synsets. We then computed sentiment polarity of the sentence. Ending up with (Category, Sentiment Polarity) pairs for each sentence in the review, we aggregated (each sentence, each review) to give overall category ratings.
We also tried to be more precise, extracting phrases in a sentence. For instance - the sentence: "The pizza is was really awesome, but had to wait a lot.", talks about two categories: 'Food' and 'Service' with opposing sentiments). We used Stanford Parser for this extraction, but dropped the idea because of the computation time. Various optimizations were done to make the system efficient.
We had, what seemed to be an inexhaustible repo of ideas for increasing accuracy and improving results: using the review rating provided by yelp, reviewer profile information, different similarity measures, trying out topic modeling (LDA), Supervised learning... Due to time-constraints of a hackathon we could not explore everything, but this is an ongoing project, with quite a many applications such as summarization and stuff.
On this advent of my first technical post, I would like to thank my god, family and friends and their neighbours. And only for the sake of it, should also consider to mention, though not in a completely feeble, fleeting and frivolous manner, my team for this project: Himanshu, Ankit and Harishankar Vishwanathan.
Results: http://avinav.science:4000/
Git: https://github.com/avinav/Yelp_Review_Categorization
Some related papers and articles:
WordNet::Similarity - Measuring the Relatedness of Concepts
Wordnet based semantic similarity measurement
Sentence Similarity Based on Semantic Nets and Corpus Statistics