Data Intelligence, Business Analytics
When you are looking to refinance your mortgage, your banker will first look at a Zillow estimates too check the value of your home. These estimates are based on statistical models that were trained on data available when sales volume were high and patterns were either "strong growth" or "strong decline".
Today, as sales volumes are abysmal, these models must compare your home to a home 5 miles away sold more or less recently, in order to have an estimate, if there is no recent sale in your neighborhood. This means that they compare your home trend with price trends occuring on very different houses, like much cheaper and smaller houses far away from where you live - as these are the houses typically sold in larger volumes right now, as their foreclosure rate is higher and price very attractive. Of course Zillow will adjust for square feet and zipcode historical trends, but how do they adjust for the fact there's no foreclosure in your $500K-home neighborhood, when the $250k-home neighborhood sees a spike in foreclosures? My answer is that their statistical estimates for home value have become more and more biased as their decision trees either use final nodes that are too small, or (in order to avoid this effect) blend apples and oranges into a same bin, resulting in highly volatile estimates.
Another example of issue with Zillow: the price to rent in our neighborhood has increased. These are $500K houses, and nobody has every rented such homes - nobody is indeed renting. So how can their algorithms "think" that the price for renting is increasing (and indeed well above the typical mortgage)? It simply does not make sense, but if someone has an explanation, feel free to share.
Comment
Another interesting web page about house valuation. Here's an extract:
I know that Amy is no longer at analytic bridge, but does anyone think that these estimates are coming more into line (less biased) now that home prices are again rising? Which might indicate that current estimates are coning back more into line with pre-Bubble, pre-Crash patterns?
Here some interesting comments on Zillow estimates: http://valuation411.blogspot.com/2006/09/cracking-zillow-code.html
Estimates are highly correlated with local assessments. There's no magic in the formula. Zillow claims that the single most predictive factor is square feet. But why build sophisticated models Zillow estimate is essentially a function of the local assessment? What additional precision / lift do other variables bring in their model?
Good questions, Amy.
I have always wondered to what extent Zillow's automatic valuation model (AVM) is based on finding comparable properties, versus training a (non-parametric?) regression model of hedonic characteristics (i.e. number of bathrooms, square footage).
Real estate transaction data ("solds") are definitely a limited sample size, even in economic boom times. Last year I wrote a blog post on this issue, and about the time lag of sold data: http://blog.someben.com/2011/06/fungal-houses/
© 2015 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge