Data Intelligence, Business Analytics
When you are looking to refinance your mortgage, your banker will first look at a Zillow estimates too check the value of your home. These estimates are based on statistical models that were trained on data available when sales volume were high and patterns were either "strong growth" or "strong decline".
Today, as sales volumes are abysmal, these models must compare your home to a home 5 miles away sold more or less recently, in order to have an estimate, if there is no recent sale in your neighborhood. This means that they compare your home trend with price trends occuring on very different houses, like much cheaper and smaller houses far away from where you live - as these are the houses typically sold in larger volumes right now, as their foreclosure rate is higher and price very attractive. Of course Zillow will adjust for square feet and zipcode historical trends, but how do they adjust for the fact there's no foreclosure in your $500K-home neighborhood, when the $250k-home neighborhood sees a spike in foreclosures? My answer is that their statistical estimates for home value have become more and more biased as their decision trees either use final nodes that are too small, or (in order to avoid this effect) blend apples and oranges into a same bin, resulting in highly volatile estimates.
Another example of issue with Zillow: the price to rent in our neighborhood has increased. These are $500K houses, and nobody has every rented such homes - nobody is indeed renting. So how can their algorithms "think" that the price for renting is increasing (and indeed well above the typical mortgage)? It simply does not make sense, but if someone has an explanation, feel free to share.