The Mushroom Dataset from the UCI Machine Learning Repository is a popular dataset used in machine learning research and practice. It contains data on different types of mushrooms, with the primary task being to classify mushrooms as either edible or poisonous based on their physical characteristics.
Key Details of the Dataset:
- Instances: 8124 samples.
- Features: 22 categorical features (e.g., cap shape, surface, color, gill size, odor).
- Target Variable:
p
: poisonouse
: edible
- Missing Values: The feature
stalk-root
contains some missing values, represented by?
. - Source: Collected from The Audubon Society Field Guide to North American Mushrooms (1981).
Typical Uses:
- Classification Tasks: It’s a common dataset for classification problems because the features are purely categorical.
- Data Preprocessing Practice: It provides an opportunity to practice techniques like one-hot encoding and handling missing data.
- Modeling: Experiment with machine learning algorithms like Decision Trees, Random Forests, or Gradient Boosted Trees.