House Prices - Advanced Regression Techniques

1 week
Data Science

Project Overview

Built predictive bottomline and fine tuned Machine Learning models to predict house prices using the Ames housing dataset from Kaggle's House Prices competition. Link: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview

What Was Done

1. Data Exploration & Analysis

  • Dataset Loading: Loaded training (1,460 houses) and test (1,459 houses) datasets


  • Exploratory Data Analysis:

    • Examined 80 features describing house characteristics

    • Analyzed price distribution using histograms and statistical summaries

    • Visualized numerical feature distributions

    • Identified missing values patterns

    • Analyzed feature correlations with house prices

2. Data Cleaning and Preprocessing

  • Missing Value Handling: Created a comprehensive preprocess_data() function that:

    • Filled LotFrontage with mean values

    • Handled categorical missing values with mode

    • Removed problematic columns (Alley, GarageYrBlt, PoolQC, Fence, MiscFeature)


  • Feature Engineering:

    • Label encoded 40+ categorical features

    • Ensured consistent encoding between training and test sets


  • Data Splitting: 80/20 train-testing split

3. Model Implementation

  • Scikit-learn Models

    • Implemented both baseline and fine-tuned versions of:

      • Random Forest Regressor

      • K-Nearest Neighbors Regressor

      • Support Vector Machine (SVR)

      • XGBoost Regressor


  • TensorFlow Decision Forests

    • Random Forest Model (baseline & fine-tuned)

    • Gradient Boosted Trees Model (baseline & fine-tuned)

4. Hyperparameter Tuning

  • Used GridSearchCV for scikit-learn models


  • Tuned parameters including:

    • Number of estimators, max depth, min samples split (Random Forest)

    • Number of neighbors, weights, distance metrics (k-NN)

    • Kernel parameters, C, gamma (SVM)

    • Learning rate, tree parameters (XGBoost)

5. Model Evaluation & Comparison

  • Evaluation Metric:

    • Root Mean Squared Error (RMSE)


  • Performance Tracking:

    • Compared baseline vs fine-tuned models


  • Visualization:

    • Created comparative bar plots showing model performance


  • Best Model Selection:

    • Automatically identified the best performing model


Results

Gradient Boosted Trees (TensorFlow) Fine-tuned: ~20,475 RMSE

Github: https://github.com/fahim-ysr/house-sales-price-prediction

Other Projects

Let's Connect!

Let's Connect!

Let's Connect!

© Copyright 2025. All rights Reserved.

Made

in

© Copyright 2025. All rights Reserved.

Made

in

Create a free website with Framer, the website builder loved by startups, designers and agencies.