The Art of the WTF Listing

Nice analysis. It would make sense to have a flattish slope in some areas if there was a clear correlation between a sqf number and additional new functionality. For ex, if 1800sqf was a magic area to get a 4th bed, 2500sqf was a magic area to get a bonus space and 3200sqf gave you 5th bed, then you might get ranges with flattish slopes as opposed to getting an enlarged version of same floor plan, then I'd expect downward slope. Overall agree slope much flatter than expected.
 
There are also floor plans that are more popular than others....for example the Olivos floor plans that are in Quail Hill, Northwood II, and Northpark Square are very popular.  Another variable are upgrades (obviously a more upgraded home will have a higher price per SF).  If normalized your data for those variables I think the relationship will become more linear but still not perfect.  When it comes to homes, there are emotions on both the buy and the sell side and it's those emotions (I'll include greed in here) that cause the WTF prices.  But again, WTF prices is subjective as well. 
 
Also real estate is a vey "inefficient market"  There is a lot of emotions and greater value played on intangibles.  Throw in cultural beliefs and additional values are added to other factors not traditionally placed values (i.e. address number, location on street, etc).  In a large enough market these variable may get washed out, but even still, real estate is notoriously inefficient.
 
You're overthinking it.  The vast majority of your data points are grouped so small that their values are statistically immaterial.    You're grouping is also very broken.

The easiest way to see it is to take your spreadsheeet, put fiilters on your source.  Filter for SFR and rounded size of 1600.  The main WTF point as you say.  The data has five entries and statistically irrelevant due to sample size, but very enlightening when you look at the raw data.

The highest priced outlier is a 3/2 in Turtle Rock on a 8276sf lot.
The 2nd highest PSF is also Turtle Rock, marginally bigger at 1696 sf but a 4/2 on a 5000sf lot.
The lowest priced one is in Woodbury, a 3/2.5 on a 3594sf lot (probably a detached condo, marked as SFR)
The 2nd lowest price one is Westpark, 4/3 @ 1689sf on a 6000sf lot.
The middle place is in woodbridge, it's 3/2 on 5600sf lot.

Frankly, I don't think I'd compare any one of those homes with the other, except maybe the two in Turtle Rock. and they're close, with basically a $100K for a 8200 sf lot premium.
 
I would hardly call 10, 5, 7 & 3 a large count.  Are you talking the jump to $448 for the 2500sf homes? There's only 3 of them.  It makes complete sense, you jumped from not so small sample of 40/60 mix of 3/2s and 4/2 to a pure 4/3 grouping of only three.

The 2200-2499sf range is $410, $439 and $428.  That's a whole 7% spread, and based on the median for sample size of 5.  The $448 jumps to the 2500-3000 group and happens to be just 3 homes, all of which are 4 bed/ 3 bath

You've got basically a random grouping and it's being distorted by small sample size.

BTW, that highest price unit in the top group is in backup offers.  It's also the only one of three with a completed backyard.

 
You can see something interesting though.  Change your pivot table.  Remove the filter for location, move the location selection to Rows.  Remove the size range selection to remove the noise.

Notice it's a rather flat line at $400 psf with a couple notable standout.  Turtle Rock, Turtle Ridge, and Westpark @ $497psf.

Woodbridge and Woodbury also come in high.

Switch to be condos and SFRs and the line gets flatter still.  Same neighborhoods stand out.

Now create three new columns in the data to calculate the average PSF for the neighborhood and 2nd column for PSF/avg psf ratio.  Now you can look for the high low standouts for their neighborhood.
 
Actually, I learned something cool from Open's spreadsheet.  Pivot tables aren't my strong suit, but if you take his combo chart and instead of doing what I said, just add one entry to the pivot pivot table and put a StdDev of PSF in the summation values, you can then group the chart by neighborhood and see the greatest variability neighborhoods are Airport Area, Turtle Rock, Turtle Ridge and properties just listed as "Irvine".  And you can more easily parse that by home type, for condo/SFR.

Drop the bedroom count in filters and go a step further.  Doing SFR, 4+ bedroom, the line is really flat at $400/sf, standouts are Turtle Rock ($556), Woodbridge ($475), and Woodbury ($457).

All suffering from low counts though.  The high counts Northwood and Northpark are pretty spot on at $400. with Std Devs of $40/$50.
 
Also, the mix of single story homes to two story homes will distort the data too (single family homes will always trade at a higher $/sf than a two story home with all other things being close to equal).  There's a lot of variables to consider here.
 
nosuchreality said:
I would hardly call 10, 5, 7 & 3 a large count.  Are you talking the jump to $448 for the 2500sf homes? There's only 3 of them.  It makes complete sense, you jumped from not so small sample of 40/60 mix of 3/2s and 4/2 to a pure 4/3 grouping of only three.

The 2200-2499sf range is $410, $439 and $428.  That's a whole 7% spread, and based on the median for sample size of 5.  The $448 jumps to the 2500-3000 group and happens to be just 3 homes, all of which are 4 bed/ 3 bath

You've got basically a random grouping and it's being distorted by small sample size.

BTW, that highest price unit in the top group is in backup offers.  It's also the only one of three with a completed backyard.

The rough rule of thumb for sample size to achieve statistical significance is 30
 
Marty said:
Also real estate is a vey "inefficient market"  There is a lot of emotions and greater value played on intangibles.  Throw in cultural beliefs and additional values are added to other factors not traditionally placed values (i.e. address number, location on street, etc).  In a large enough market these variable may get washed out, but even still, real estate is notoriously inefficient.
Very well said, both sellers and buyers can have non-rational pricing/buying motivations, and my guess is that (especially smaller sized/sampled) asking pricing graphs will reflect this in some way.
 
Back
Top