Let’s walk through how to apply Machine Learning on the Appliances Energy Prediction dataset to predict Appliances (energy consumption in Wh). I’ll break it into clear steps with tools you can use like Python + scikit-learn or even Power BI (limited ML). Here’s the full pipeline:
Dataset Link: https://archive.ics.uci.edu/dataset/374/appliances+energy+prediction
✅ Step-by-Step Machine Learning Workflow
Step 1: Load and Explore the Data
Use Python with Pandas:
import pandas as pd
df = pd.read_csv("energydata_complete.csv", parse_dates=['date'])
df.head()
Step 2: Feature Engineering
Create time-based features:
df['hour'] = df['date'].dt.hour
df['day_of_week'] = df['date'].dt.dayofweek
df['month'] = df['date'].dt.month
Drop date, or keep it only for time series models.
Step 3: Define Features (X) and Target (y)
X = df.drop(['date', 'Appliances'], axis=1)
y = df['Appliances']
Optionally remove rv1, rv2 if not wanted:
X = X.drop(['rv1', 'rv2'], axis=1)
Step 4: Train-Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 5: Train a Model (e.g., Random Forest)
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Step 6: Evaluate the Model
print("R2 Score:", r2_score(y_test, y_pred))
print("RMSE:", mean_squared_error(y_test, y_pred, squared=False))
Step 7: Feature Importance
import matplotlib.pyplot as plt
importances = model.feature_importances_
feat_names = X.columns
plt.figure(figsize=(10,6))
plt.barh(feat_names, importances)
plt.xlabel("Feature Importance")
plt.title("Which features affect energy usage most?")
plt.show()
🧠 Bonus: Try Other Models
GradientBoostingRegressorXGBoostLinearRegression(for baseline)
