
Project Overview
Built predictive bottomline and fine tuned Machine Learning models to predict house prices using the Ames housing dataset from Kaggle's House Prices competition. Link: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview

What Was Done
1. Data Exploration & Analysis
Dataset Loading: Loaded training (1,460 houses) and test (1,459 houses) datasets
Exploratory Data Analysis:
Examined 80 features describing house characteristics
Analyzed price distribution using histograms and statistical summaries
Visualized numerical feature distributions
Identified missing values patterns
Analyzed feature correlations with house prices
2. Data Cleaning and Preprocessing
Missing Value Handling: Created a comprehensive preprocess_data() function that:
Filled LotFrontage with mean values
Handled categorical missing values with mode
Removed problematic columns (Alley, GarageYrBlt, PoolQC, Fence, MiscFeature)
Feature Engineering:
Label encoded 40+ categorical features
Ensured consistent encoding between training and test sets
Data Splitting: 80/20 train-testing split
3. Model Implementation
Scikit-learn Models
Implemented both baseline and fine-tuned versions of:
Random Forest Regressor
K-Nearest Neighbors Regressor
Support Vector Machine (SVR)
XGBoost Regressor
TensorFlow Decision Forests
Random Forest Model (baseline & fine-tuned)
Gradient Boosted Trees Model (baseline & fine-tuned)
4. Hyperparameter Tuning
Used GridSearchCV for scikit-learn models
Tuned parameters including:
Number of estimators, max depth, min samples split (Random Forest)
Number of neighbors, weights, distance metrics (k-NN)
Kernel parameters, C, gamma (SVM)
Learning rate, tree parameters (XGBoost)
5. Model Evaluation & Comparison
Evaluation Metric:
Root Mean Squared Error (RMSE)
Performance Tracking:
Compared baseline vs fine-tuned models
Visualization:
Created comparative bar plots showing model performance
Best Model Selection:
Automatically identified the best performing model
Results
Gradient Boosted Trees (TensorFlow) Fine-tuned: ~20,475 RMSE

