Here’s the dataset schema:
| Column Name | Type | Description |
|---|---|---|
| Review_ID | ID | Unique identifier for each review |
| Rating | Numerical | Star rating given by the customer (1–5) |
| Review_Length | Numerical | Number of words in the review |
| Helpful_Votes | Numerical | Count of “helpful” votes received |
| Days_Since_Purchase | Numerical | Days passed since purchase when review was written |
| Product_Category | Categorical | Category of product reviewed (Electronics, Fashion, Home, Books, Beauty, Sports, Grocery) |
| Review_Sentiment | Categorical | Sentiment of review (Positive, Neutral, Negative) |
| Verified_Purchase | Categorical | Whether the review is from a verified purchase (Yes/No) |
| Customer_Location | Categorical | Customer’s country/region (USA, UK, India, Canada, Australia, Germany, UAE) |
✨ Purpose of Dataset
This dataset can be used to analyze customer behavior, review credibility, and product performance. It helps in sentiment analysis, fraud detection (fake reviews), and product quality tracking.
Get Dataset here: https://github.com/slidescope/E-commerce-Customer-Reviews-dataset-for-Sentiment-Analysis-and-Machine-Learning/
🔍 Sample Questions to Solve
- What is the average rating across product categories?
- Do verified purchases tend to have more positive reviews than unverified ones?
- Which product category receives the longest reviews on average?
- Does review length correlate with helpful votes?
- Which country’s customers give the highest ratings on average?
- Are negative reviews more common after a longer gap (days since purchase)?
The E-commerce Customer Reviews dataset provides valuable insights into customer feedback behavior across multiple product categories. It contains 250 entries, each uniquely identified with a Review ID. The dataset includes four numerical features—Rating (1–5), Review Length (word count), Helpful Votes, and Days Since Purchase—capturing quantifiable aspects of reviews. Alongside, it has four categorical features—Product Category, Review Sentiment, Verified Purchase status, and Customer Location—offering qualitative context. This mix of numerical and categorical data makes the dataset ideal for sentiment analysis, customer satisfaction studies, fraud detection in reviews, and trend analysis by demographics or product type, supporting practical business intelligence tasks.
