Collecting product reviews from the most popular online sources using a variety of techniques.
name = 'reviews'
allowed_domains = ['amazon.com']
start_urls = ["https://www.amazon.com/GoPro-Fusion-Waterproof-Digital-Spherical/product-reviews/B0792MJLNM/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews"]
def parse(self, response):
for item in response.css('.a-section.review'):
if item.css('div::attr(data-hook)').extract_first() == 'review':
yield {
'review_id': item.css('div.a-section.celwidget::attr(id)').extract_first().split('-')[1],
'author': item.css('span.a-profile-name::text').extract_first(),
'review': ' '.join(item.css('span.review-text::text').extract())
}
next_page = response.css('.a-last > a::attr(href)').extract_first()
if next_page is not None:
yield response.follow(next_page, callback=self.parse)
This data can then provide sentiment and value in a way that can help your company achieve its highest product and customer experiences.
In this very basic example, the reviews page for a GoPro sold on Amazon is visited, and the reviewer’s name, id and review are collected.