Model Tuner Regression - Redfin Real Estate

Overview

This Google Colab notebook utilizes a dataset on the Los Angeles Real Estate market sourced from Redfin, consisting of 200 rows and 27 columns. The dataset captures a snapshot in time of various property listings, providing detailed information about each property.

Dataset Columns

The dataset includes the following columns:

  • SALE TYPE
  • SOLD DATE
  • PROPERTY TYPE
  • ADDRESS
  • CITY
  • STATE OR PROVINCE
  • ZIP OR POSTAL CODE
  • PRICE
  • BEDS
  • BATHS
  • LOCATION
  • SQUARE FEET
  • LOT SIZE
  • YEAR BUILT
  • DAYS ON MARKET
  • $/SQUARE FEET
  • HOA/MONTH
  • STATUS
  • NEXT OPEN HOUSE START TIME
  • NEXT OPEN HOUSE END TIME
  • URL (SEE https://www.redfin.com/buy-a-home/comparative-market-analysis FOR INFO ON PRICING)
  • SOURCE
  • MLS#
  • FAVORITE
  • INTERESTED
  • LATITUDE
  • LONGITUDE

Usage

The primary purpose of this notebook is to demonstrate various regression examples using the Redfin dataset. Specifically, the following variables are used for testing in this notebook:

  • Features (X):

    • BEDS
    • BATHS
    • SQUARE FEET
    • LOT SIZE
  • Target (y):

    • PRICE

Source

The dataset is sourced from Redfin, available at Redfin's Comparative Market Analysis.

Note

This dataset represents a single snapshot in time and may not reflect current market conditions.

Model Tuner Library Instructions

This notebook provides a guide on how to install and use the model_tuner library in a notebook environment like Google Colab.

Model Tuner Description

The model_tuner library is designed to streamline the process of hyperparameter tuning and model optimization for machine learning algorithms. It provides an easy-to-use interface for defining, tuning, and evaluating models.

Key Features

Automatic Hyperparameter Tuning

The library can automatically tune hyperparameters for a variety of machine learning models using advanced optimization techniques.

Cross-Validation

Integrated cross-validation ensures that the models are evaluated robustly, preventing overfitting.

Documentation

For detailed documentation and advanced usage of the model_tuner library, please refer to the model_tuner documentation.

By following these steps, you should be able to install and use the model_tuner library effectively in your notebook environment. If you encounter any issues or have further questions, feel free to reach out for support.

Installation

To install the model_tuner library, use the following command:

In [1]:
! pip install model_tuner
Requirement already satisfied: model_tuner in /usr/local/lib/python3.10/dist-packages (0.0.15a0)
Requirement already satisfied: joblib>=1.3.2 in /usr/local/lib/python3.10/dist-packages (from model_tuner) (1.4.2)
Requirement already satisfied: numpy>=1.21.6 in /usr/local/lib/python3.10/dist-packages (from model_tuner) (1.26.4)
Requirement already satisfied: pandas>=1.3.5 in /usr/local/lib/python3.10/dist-packages (from model_tuner) (2.1.4)
Requirement already satisfied: scikit-learn>=1.0.2 in /usr/local/lib/python3.10/dist-packages (from model_tuner) (1.3.2)
Requirement already satisfied: scipy>=1.7.3 in /usr/local/lib/python3.10/dist-packages (from model_tuner) (1.13.1)
Requirement already satisfied: tqdm>=4.66.4 in /usr/local/lib/python3.10/dist-packages (from model_tuner) (4.66.5)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.3.5->model_tuner) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.3.5->model_tuner) (2024.2)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.3.5->model_tuner) (2024.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0.2->model_tuner) (3.5.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas>=1.3.5->model_tuner) (1.16.0)

Importing the Library

After installation, you can import the necessary components from the model_tuner library as shown below:

In [2]:
import model_tuner
from model_tuner import Model
from sklearn.impute import SimpleImputer

Checking the Version

To ensure that the model_tuner library is installed correctly, you can check its version:

In [3]:
print(help(model_tuner))
Help on package model_tuner:

NAME
    model_tuner

DESCRIPTION
    The `model_tuner` library is a versatile and powerful tool designed to 
    facilitate the training, evaluation, and tuning of machine learning models. 
    It supports various functionalities such as handling imbalanced data, applying 
    different scaling and imputation techniques, calibrating models, and conducting 
    cross-validation. This library is particularly useful for model selection, 
    hyperparameter tuning, and ensuring optimal performance across different metrics.
    
    Version: 0.0.15a

PACKAGE CONTENTS
    bootstrapper
    main
    model_tuner_utils
    pickleObjects

DATA
    __email__ = 'lshpaner@ucla.edu; alafunnell@gmail.com; pp89@ucla.edu'

VERSION
    0.0.15a

AUTHOR
    Arthur Funnell, Leonid Shpaner, Panayiotis Petousis

FILE
    /usr/local/lib/python3.10/dist-packages/model_tuner/__init__.py


None

Import Other Relevant Libraries

In [4]:
import pandas as pd
import numpy as np

from sklearn.linear_model import Lasso, Ridge, SGDRegressor
from xgboost import XGBRegressor

import warnings
from sklearn.exceptions import DataConversionWarning

warnings.filterwarnings(action="ignore", category=DataConversionWarning)
warnings.simplefilter(action='ignore', category=UserWarning)

Import Requisite Libraries

In [5]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
import xgboost as xgb

Read In The Dataset

In [6]:
# Direct download link to the Excel file
url = (
    "https://github.com/uclamii/model_tuner/raw/main/public_data/"
    "redfin_2024-04-16-15-59-17.xlsx"
)

# Read the Excel file
df = pd.read_excel(url)
In [7]:
df.head() # inspect first 5 rows of data
Out[7]:
SALE TYPE SOLD DATE PROPERTY TYPE ADDRESS CITY STATE OR PROVINCE ZIP OR POSTAL CODE PRICE BEDS BATHS ... STATUS NEXT OPEN HOUSE START TIME NEXT OPEN HOUSE END TIME URL (SEE https://www.redfin.com/buy-a-home/comparative-market-analysis FOR INFO ON PRICING) SOURCE MLS# FAVORITE INTERESTED LATITUDE LONGITUDE
0 In accordance with local MLS rules, some MLS l... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 MLS Listing NaN Single Family Residential 1633 N Beverly Dr Beverly Hills CA 90210.0 2899000.0 2.0 2.5 ... Active NaN NaN https://www.redfin.com/CA/Beverly-Hills/1633-N... CRMLS SB23231159 N Y 34.100409 -118.415785
2 MLS Listing NaN Single Family Residential 1135 Coldwater Canyon Dr Beverly Hills CA 90210.0 5650000.0 5.0 5.0 ... Active NaN NaN https://www.redfin.com/CA/Beverly-Hills/1135-C... TheMLS 24-380159 N Y 34.093565 -118.407785
3 MLS Listing NaN Condo/Co-op 433 N Doheny Dr #302 Beverly Hills CA 90210.0 1980000.0 3.0 3.0 ... Active April-21-2024 02:00 PM April-21-2024 05:00 PM https://www.redfin.com/CA/Beverly-Hills/433-N-... TheMLS 24-380177 N Y 34.078053 -118.389995
4 MLS Listing NaN Single Family Residential 1531 Summitridge Dr Beverly Hills CA 90210.0 18900000.0 8.0 10.0 ... Active NaN NaN https://www.redfin.com/CA/Beverly-Hills/1531-S... TheMLS 24-378855 N Y 34.099480 -118.420121

5 rows × 27 columns

Simple EDA and Preprocessing

In [8]:
df.columns # inspect the list of cols in the dataset
Out[8]:
Index(['SALE TYPE', 'SOLD DATE', 'PROPERTY TYPE', 'ADDRESS', 'CITY',
       'STATE OR PROVINCE', 'ZIP OR POSTAL CODE', 'PRICE', 'BEDS', 'BATHS',
       'LOCATION', 'SQUARE FEET', 'LOT SIZE', 'YEAR BUILT', 'DAYS ON MARKET',
       '$/SQUARE FEET', 'HOA/MONTH', 'STATUS', 'NEXT OPEN HOUSE START TIME',
       'NEXT OPEN HOUSE END TIME',
       'URL (SEE https://www.redfin.com/buy-a-home/comparative-market-analysis FOR INFO ON PRICING)',
       'SOURCE', 'MLS#', 'FAVORITE', 'INTERESTED', 'LATITUDE', 'LONGITUDE'],
      dtype='object')
In [9]:
print(f"This dataset has {df.shape[0]} rows and {df.shape[1]} columns.")
This dataset has 200 rows and 27 columns.
In [10]:
df = df.drop(df.index[0]) # remove first row of dataframe which is not used

Set Up The Feature Space and Dependent Variable

In [11]:
X = df[["BEDS", "BATHS", "SQUARE FEET", "LOT SIZE"]]
y = df[["PRICE"]]

Lasso Regression

Define Hyperparameters for Lasso Regression

In [12]:
# Define the set of hyperparameters to tune
lasso_pipeline_hyperparms_grid = [
    {
        "lasso__fit_intercept": [True, False],
        "lasso__precompute": [False],
        "lasso__copy_X": [True, False],
        "lasso__max_iter": [100, 500, 1000, 2000],
        "lasso__tol": [1e-4, 1e-3],
        "lasso__warm_start": [True, False],
        "lasso__positive": [True, False],
    }
]
In [13]:
lasso_reg = Lasso()

estimator_name = "lasso"
# Set the parameters by cross-validation

kfold = False
calibrate = False

Initialize and Configure the Lasso Model

In [14]:
redfin_lasso = Model(
    pipeline_steps=[
        ("Preprocessor", SimpleImputer()),
    ],
    name="Redfin_model_Lasso",
    estimator_name=estimator_name,
    model_type="regression",
    calibrate=calibrate,
    estimator=lasso_reg,
    kfold=kfold,
    stratify_y=False,
    grid=lasso_pipeline_hyperparms_grid,
    randomized_grid=True,
    scoring=["r2"],
    random_state=3,
)

redfin_lasso.grid_search_param_tuning(X, y)

X_train, y_train = redfin_lasso.get_train_data(X, y)
X_test, y_test = redfin_lasso.get_test_data(X, y)
X_valid, y_valid = redfin_lasso.get_valid_data(X, y)

redfin_lasso.fit(X_train, y_train)
100%|██████████| 100/100 [00:01<00:00, 60.02it/s]
Best score/param set found on validation set:
{'params': {'lasso__copy_X': True,
            'lasso__fit_intercept': True,
            'lasso__max_iter': 500,
            'lasso__positive': False,
            'lasso__precompute': False,
            'lasso__tol': 0.001,
            'lasso__warm_start': False},
 'score': 0.6282190547419337}
Best r2: 0.628 

Validation and Bootstrap Metrics

In [15]:
print("Validation Metrics")
redfin_lasso.return_metrics(X_valid, y_valid)
print("Test Metrics")
redfin_lasso.return_metrics(X_test, y_test)
print("Bootstrap Metrics")

X_test = np.array(X_test)
y_test = np.array(y_test)
redfin_lasso.return_bootstrap_metrics(
    X_test,
    y_test,
    metrics=["r2", "explained_variance"],
    n_samples=30,
    num_resamples=300,
)
Validation Metrics
********************************************************************************
{'Explained Variance': 0.6364405447537245,
 'Mean Absolute Error': 8393990.928465009,
 'Mean Squared Error': 199886219886674.84,
 'Median Absolute Error': 5642073.254186362,
 'R2': 0.6282190547419337,
 'RMSE': 14138112.316949347}
********************************************************************************
Test Metrics
********************************************************************************
{'Explained Variance': 0.31386392138329666,
 'Mean Absolute Error': 6288722.984962022,
 'Mean Squared Error': 67390699796721.52,
 'Median Absolute Error': 4838796.157459403,
 'R2': 0.2606908875968984,
 'RMSE': 8209183.86910182}
********************************************************************************
Bootstrap Metrics
100%|██████████| 300/300 [00:03<00:00, 78.24it/s]
Out[15]:
Metric Mean 95% CI Lower 95% CI Upper
0 r2 0.161591 0.128569 0.194612
1 explained_variance 0.267669 0.237899 0.297439

Ridge Regression

Define Hyperparameters for Ridge Regression

In [16]:
ridge_reg = Ridge()

estimator_name = "ridge"
# Set the parameters by cross-validation

ridge_pipeline_hyperparms_grid = {
    f"{estimator_name}__max_iter": [100, 200, 500],
    f"{estimator_name}__alpha": [0.1, 1, 0.5],
    "selectKBest__k": [1, 2, 3],
}

kfold = False
calibrate = False

Initialize and Configure the Ridge Model

In [17]:
redfin_ridge = Model(
    pipeline_steps=[
        ("Preprocessor", SimpleImputer()),
    ],
    name="Redfin_model_Ridge",
    estimator_name=estimator_name,
    model_type="regression",
    calibrate=calibrate,
    estimator=ridge_reg,
    kfold=kfold,
    selectKBest=True,
    stratify_y=False,
    grid=ridge_pipeline_hyperparms_grid,
    randomized_grid=False,
    scoring=["r2"],
    n_splits=5,
    random_state=3,
)

redfin_ridge.grid_search_param_tuning(X, y)

### If KFold then the whole dataset is fed to the
### return metrics function
redfin_ridge.fit(X, y)
redfin_ridge.return_metrics(X, y)
100%|██████████| 27/27 [00:02<00:00, 11.97it/s]
Best score/param set found on validation set:
{'params': {'ridge__alpha': 0.1, 'ridge__max_iter': 100, 'selectKBest__k': 3},
 'score': 0.6191429015946501}
Best r2: 0.619 

********************************************************************************
{'Explained Variance': 0.5766006888023398,
 'Mean Absolute Error': 7487310.289759071,
 'Mean Squared Error': 152632774544656.1,
 'Median Absolute Error': 4532978.504302945,
 'R2': 0.5766006888023398,
 'RMSE': 12354463.749781134}
********************************************************************************

Feature names selected:
['BATHS', 'SQUARE FEET', 'LOT SIZE']

Out[17]:
{'Regression Report': {'Explained Variance': 0.5766006888023398,
  'R2': 0.5766006888023398,
  'Mean Absolute Error': 7487310.289759071,
  'Median Absolute Error': 4532978.504302945,
  'Mean Squared Error': 152632774544656.1,
  'RMSE': 12354463.749781134},
 'K Best Features': ['BATHS', 'SQUARE FEET', 'LOT SIZE']}
In [18]:
redfin_ridge.predict(X_test)
Out[18]:
array([[14820416.22985085],
       [12931799.15827887],
       [27556389.26646754],
       [ 4791486.4612512 ],
       [10137760.68708053],
       [ 6463446.62012611],
       [14015619.25132306],
       [20373915.49647246],
       [12415162.89672655],
       [-2061894.01291453],
       [  328769.83586473],
       [12615310.49286705],
       [ 2433980.45788748],
       [11375444.07953293],
       [14755188.1924329 ],
       [ 7913073.54518622],
       [17006360.5529535 ],
       [19010742.36441287],
       [26641367.82500629],
       [11170172.02089627],
       [15014059.95768299],
       [ 6045949.35489455],
       [  555494.08607034],
       [15549827.30967466],
       [ 8374461.16216404],
       [ 7076633.55532976],
       [24998651.96904591],
       [15989762.73999921],
       [30638904.30307189],
       [14782476.32582068],
       [17608027.73570456],
       [14744958.15479106],
       [ 2148614.70795859],
       [ 8466382.93778243],
       [ 9261336.09290196],
       [  384561.62791359],
       [15353218.2850842 ],
       [34009417.32096082],
       [25749368.64484414],
       [15043239.86481608]])

SGD Regressor

Define Hyperparameters for SGD Regressor

In [19]:
sgd_reg = SGDRegressor(random_state=3)

estimator_name = "sgdregressor"
# Set the parameters by cross-validation

kfold = False
calibrate = False
In [20]:
sgd_pipeline_hyperparms_grid = [
    {
        "sgdregressor__loss": [
            "squared_error",
            "huber",
            "epsilon_insensitive",
            "squared_epsilon_insensitive",
        ],
        "sgdregressor__penalty": [None, "l2", "l1", "elasticnet"][:1],
        "sgdregressor__alpha": [0.0001, 0.001, 0.01, 0.1][:1],
        "sgdregressor__l1_ratio": [
            0.15,
            0.25,
            0.5,
            0.75,
        ][
            :1
        ],  # Only used if penalty is 'elasticnet'
        "sgdregressor__fit_intercept": [True, False][:1],
        "sgdregressor__max_iter": [1000, 2000, 3000][:1],
        "sgdregressor__tol": [1e-3, 1e-4][:1],
        "sgdregressor__epsilon": [
            0.1,
            0.2,
        ],  # Only used for 'huber' and 'epsilon_insensitive'
        "sgdregressor__learning_rate": [
            "constant",
            "optimal",
            "invscaling",
            "adaptive",
        ][:1],
        "sgdregressor__eta0": [
            0.01,
            0.1,
        ][:1],
        "sgdregressor__power_t": [
            0.25,
            0.5,
        ][:1],
        "sgdregressor__early_stopping": [True, False][:1],
        "sgdregressor__validation_fraction": [
            0.1,
            0.2,
        ][:1],
        "sgdregressor__n_iter_no_change": [
            5,
            10,
        ][:1],
        "sgdregressor__warm_start": [True, False][:1],
        "sgdregressor__average": [
            False,
            True,
            10,
        ][:1],
    }
]

Initialize and Configure the SGD Model

In [21]:
redfin_sgd = Model(
    pipeline_steps=[
        ("Preprocessor", SimpleImputer()),
    ],
    name="Redfin_model_SGD",
    estimator_name=estimator_name,
    model_type="regression",
    calibrate=calibrate,
    estimator=sgd_reg,
    kfold=kfold,
    stratify_y=False,
    grid=sgd_pipeline_hyperparms_grid,
    randomized_grid=False,
    # n_iter=3,
    scoring=["r2"],
    # n_splits=2,
    random_state=3,
)
redfin_sgd.grid_search_param_tuning(X, y)


X_train, y_train = redfin_sgd.get_train_data(X, y)
X_test, y_test = redfin_sgd.get_test_data(X, y)
X_valid, y_valid = redfin_sgd.get_valid_data(X, y)

redfin_sgd.fit(X_train, y_train)

print("Validation Metrics")
redfin_sgd.return_metrics(X_valid, y_valid)
print("Test Metrics")
redfin_sgd.return_metrics(X_test, y_test)
100%|██████████| 8/8 [00:00<00:00, 13.08it/s]
Best score/param set found on validation set:
{'params': {'sgdregressor__alpha': 0.0001,
            'sgdregressor__average': False,
            'sgdregressor__early_stopping': True,
            'sgdregressor__epsilon': 0.1,
            'sgdregressor__eta0': 0.01,
            'sgdregressor__fit_intercept': True,
            'sgdregressor__l1_ratio': 0.15,
            'sgdregressor__learning_rate': 'constant',
            'sgdregressor__loss': 'huber',
            'sgdregressor__max_iter': 1000,
            'sgdregressor__n_iter_no_change': 5,
            'sgdregressor__penalty': None,
            'sgdregressor__power_t': 0.25,
            'sgdregressor__tol': 0.001,
            'sgdregressor__validation_fraction': 0.1,
            'sgdregressor__warm_start': True},
 'score': 0.18380622663876656}
Best r2: 0.184 

Validation Metrics
********************************************************************************
{'Explained Variance': 0.3049836483940087,
 'Mean Absolute Error': 9876129.928760896,
 'Mean Squared Error': 438822618891813.0,
 'Median Absolute Error': 4151062.6672300193,
 'R2': 0.18380622663876656,
 'RMSE': 20948093.44288432}
********************************************************************************
Test Metrics
********************************************************************************
{'Explained Variance': -0.13921368196432482,
 'Mean Absolute Error': 6547445.2186167445,
 'Mean Squared Error': 109886075781456.58,
 'Median Absolute Error': 4647607.070584111,
 'R2': -0.2055042816961643,
 'RMSE': 10482655.950733887}
********************************************************************************
Out[21]:
{'Explained Variance': -0.13921368196432482,
 'R2': -0.2055042816961643,
 'Mean Absolute Error': 6547445.2186167445,
 'Median Absolute Error': 4647607.070584111,
 'Mean Squared Error': 109886075781456.58,
 'RMSE': 10482655.950733887}
In [22]:
redfin_sgd.predict(X_test)
Out[22]:
array([10750258.75341237,  7483945.27694425, 10822482.82247593,
        2713718.71007497,  7362358.65201392,  4656568.06902705,
       11034486.2055147 , 13348934.28805413,  1718524.80507851,
       -2109073.21995107,  1511174.7725108 ,  5640292.40486405,
        3192143.87724924, 10363373.73337553, 11811593.15341237,
        4468145.46866569,  9570121.3180854 ,  8728589.41640283,
       52122648.90199854,  6208192.36213685,  7599454.9299641 ,
        2862385.34237772,  2192915.21069794,  7857222.72340637,
        5342497.47302484,  5025262.83457139, 15411870.64813384,
       12451841.47461439, 14333018.06818269, 11367584.16858479,
        9295246.98396109, 11978047.50789513,  2055627.87882633,
        6232768.68607364,  3095445.32204247,  2135730.2184002 ,
       10540048.76241919, 19090105.7526744 , 13390552.16639836,
        7124664.63341238])

XGBoost

Define Hyperparameters for XGBoost

In [23]:
xg_boost = XGBRegressor(random_state=3)

estimator_name = "xgb"
# Set the parameters by cross-validation

kfold = False
calibrate = False
In [24]:
# Define the hyperparameters for XGBoost
xgb_learning_rates = [0.1, 0.01, 0.05][:1]  # Learning rate or eta
xgb_n_estimators = [100, 200, 300][
    :1
]  # Number of trees. Equivalent to n_estimators in GB
xgb_max_depths = [3, 5, 7][:1]  # Maximum depth of the trees
xgb_subsamples = [0.8, 1.0][:1]  # Subsample ratio of the training instances
xgb_colsample_bytree = [0.8, 1.0][:1]
xgb_eval_metric = ["logloss"]
xgb_early_stopping_rounds = [10]
# xgb_tree_method = ["gpu_hist"]
# early_stopping_mode = ['min']
# early_stopping_patience = [5]
xgb_verbose = [False]
# Subsample ratio of columns when constructing each tree

# Combining the hyperparameters in a dictionary
xgb_pipeline_hyperparms_grid = [
    {
        "xgb__learning_rate": xgb_learning_rates,
        "xgb__n_estimators": xgb_n_estimators,
        "xgb__max_depth": xgb_max_depths,
        "xgb__subsample": xgb_subsamples,
        "xgb__colsample_bytree": xgb_colsample_bytree,
        "xgb__eval_metric": xgb_eval_metric,
        "xgb__early_stopping_rounds": xgb_early_stopping_rounds,
        # 'xgb__early_stopping_patience': early_stopping_patience,
        # "xgb_tree_method": xgb_tree_method,
        "xgb__verbose": xgb_verbose,
    }
]

Initialize and Configure the XGB Model

In [25]:
redfin_xgb = Model(
    pipeline_steps=[
        ("Preprocessor", SimpleImputer()),
    ],
    name="Redfin_model_XGB",
    estimator_name=estimator_name,
    model_type="regression",
    calibrate=calibrate,
    estimator=xg_boost,
    kfold=kfold,
    stratify_y=False,
    grid=xgb_pipeline_hyperparms_grid,
    randomized_grid=False,
    # n_iter=3,
    scoring=["r2"],
    # n_splits=2,
    random_state=3,
    xgboost_early=True,
)

eval_set = [X, y]
redfin_xgb.grid_search_param_tuning(X, y)

X_train, X_valid, X_test, y_train, y_valid, y_test = redfin_xgb.train_val_test_split(
    X,
    y,
    stratify_y=False,
    stratify_cols=None,
    train_size=0.6,
    validation_size=0.2,
    test_size=0.2,
    calibrate=redfin_xgb.calibrate,
    random_state=redfin_xgb.random_state,
)

redfin_xgb.fit(X_train, y_train, validation_data=(X_valid, y_valid))

redfin_xgb.return_metrics(X_test, y_test)
100%|██████████| 1/1 [00:00<00:00,  1.20it/s]
Best score/param set found on validation set:
{'params': {'xgb__colsample_bytree': 0.8,
            'xgb__early_stopping_rounds': 10,
            'xgb__eval_metric': 'logloss',
            'xgb__learning_rate': 0.1,
            'xgb__max_depth': 3,
            'xgb__n_estimators': 21,
            'xgb__subsample': 0.8},
 'score': 0.6184914036969227}
Best r2: 0.618 

********************************************************************************
{'Explained Variance': 0.2656006887605751,
 'Mean Absolute Error': 5280094.075,
 'Mean Squared Error': 68236331217477.984,
 'Median Absolute Error': 3129754.5,
 'R2': 0.25141389511892365,
 'RMSE': 8260528.507152432}
********************************************************************************
Out[25]:
{'Explained Variance': 0.2656006887605751,
 'R2': 0.25141389511892365,
 'Mean Absolute Error': 5280094.075,
 'Median Absolute Error': 3129754.5,
 'Mean Squared Error': 68236331217477.984,
 'RMSE': 8260528.507152432}

Reference

Redfin. (n.d.). Redfin: Real Estate, Homes for Sale, MLS Listings, Agents. Retrieved from https://www.redfin.com