House Prices - Advanced Regression Techniques

House Prices - Advanced Regression Techniques

House Prices - Advanced Regression Techniques

1 week
1 week
1 week
Data Science
Data Science
Data Science

Project Overview

Built predictive bottomline and fine tuned Machine Learning models to predict house prices using the Ames housing dataset from Kaggle's House Prices competition. Link: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview

What Was Done

1. Data Exploration & Analysis

  • Dataset Loading: Loaded training (1,460 houses) and test (1,459 houses) datasets


  • Exploratory Data Analysis:

    • Examined 80 features describing house characteristics

    • Analyzed price distribution using histograms and statistical summaries

    • Visualized numerical feature distributions

    • Identified missing values patterns

    • Analyzed feature correlations with house prices

2. Data Cleaning and Preprocessing

  • Missing Value Handling: Created a comprehensive preprocess_data() function that:

    • Filled LotFrontage with mean values

    • Handled categorical missing values with mode

    • Removed problematic columns (Alley, GarageYrBlt, PoolQC, Fence, MiscFeature)


  • Feature Engineering:

    • Label encoded 40+ categorical features

    • Ensured consistent encoding between training and test sets


  • Data Splitting: 80/20 train-testing split

3. Model Implementation

  • Scikit-learn Models

    • Implemented both baseline and fine-tuned versions of:

      • Random Forest Regressor

      • K-Nearest Neighbors Regressor

      • Support Vector Machine (SVR)

      • XGBoost Regressor


  • TensorFlow Decision Forests

    • Random Forest Model (baseline & fine-tuned)

    • Gradient Boosted Trees Model (baseline & fine-tuned)

4. Hyperparameter Tuning

  • Used GridSearchCV for scikit-learn models


  • Tuned parameters including:

    • Number of estimators, max depth, min samples split (Random Forest)

    • Number of neighbors, weights, distance metrics (k-NN)

    • Kernel parameters, C, gamma (SVM)

    • Learning rate, tree parameters (XGBoost)

5. Model Evaluation & Comparison

  • Evaluation Metric:

    • Root Mean Squared Error (RMSE)


  • Performance Tracking:

    • Compared baseline vs fine-tuned models


  • Visualization:

    • Created comparative bar plots showing model performance


  • Best Model Selection:

    • Automatically identified the best performing model


Results

Gradient Boosted Trees (TensorFlow) Fine-tuned: ~20,475 RMSE

Github: https://github.com/fahim-ysr/house-sales-price-prediction

Other Projects

Let's Connect!

Let's Connect!

Let's Connect!

© Copyright 2025. All rights Reserved.

Made

in

© Copyright 2025. All rights Reserved.

Made

in

© Copyright 2025. All rights Reserved.

Made

in

Create a free website with Framer, the website builder loved by startups, designers and agencies.