Machine Learning on the Appliances Energy Prediction dataset

Categories: Artificial Intelligence / Machine Learning

Tags:

Machine Learning on the Appliances Energy Prediction dataset

Let’s walk through how to apply Machine Learning on the Appliances Energy Prediction dataset to predict Appliances (energy consumption in Wh). I’ll break it into clear steps with tools you can use like Python + scikit-learn or even Power BI (limited ML). Here’s the full pipeline:

Dataset Link: https://archive.ics.uci.edu/dataset/374/appliances+energy+prediction

✅ Step-by-Step Machine Learning Workflow

Step 1: Load and Explore the Data

Use Python with Pandas:

import pandas as pd

df = pd.read_csv("energydata_complete.csv", parse_dates=['date'])
df.head()

Step 2: Feature Engineering

Create time-based features:

df['hour'] = df['date'].dt.hour
df['day_of_week'] = df['date'].dt.dayofweek
df['month'] = df['date'].dt.month

Drop date, or keep it only for time series models.

Step 3: Define Features (X) and Target (y)

X = df.drop(['date', 'Appliances'], axis=1)
y = df['Appliances']

Optionally remove rv1, rv2 if not wanted:

X = X.drop(['rv1', 'rv2'], axis=1)

Step 4: Train-Test Split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 5: Train a Model (e.g., Random Forest)

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

Step 6: Evaluate the Model

print("R2 Score:", r2_score(y_test, y_pred))
print("RMSE:", mean_squared_error(y_test, y_pred, squared=False))

Step 7: Feature Importance

import matplotlib.pyplot as plt

importances = model.feature_importances_
feat_names = X.columns

plt.figure(figsize=(10,6))
plt.barh(feat_names, importances)
plt.xlabel("Feature Importance")
plt.title("Which features affect energy usage most?")
plt.show()

🧠 Bonus: Try Other Models

GradientBoostingRegressor
XGBoost
LinearRegression (for baseline)