Homes in Northampton MA Near Rail Trails.
Sample of homes in Northampton, MA to see whether being close to a bike trail enhances the value of the home.
A data frame with 104 observations on the following 30 variables.
Unique house number
Lot size for the house (in acres)
Lot size groups (<= 1/4 acre or > 1/4 acre)
Estimated 1998 price (in thousands of 2014 dollars)
Estimated 2007 price (in thousands of 2014 dollars)
Estimated 2011 price (in thousands of 2014 dollars)
Bedroom groups (1-2 beds, 3 beds, or 4+ beds)
Number of bedrooms
Bike friendliness (0-100 score, higher scores are better)
Difference in price between 2014 estimate and adjusted 1998 estimate (in thousands of dollars)
Distance (in feet) to the nearest entry point to the rail trail network
Distance groups, compared to 1/2 mile (Closer or Farther Away)
Number of garage spaces (0-4)
Any garage spaces? (no or yes)
Latitude (for mapping)
Longitude (for mapping)
Number of full baths (includes shower or bathtub)
Number of half baths (no shower or bathtub)
Number of rooms
Percentage change from adjusted 1998 price to 2014 (value of zero means no change)
Zillow 10 year estimate from 2008 (in thousands of dollars)
Zillow price estimate from 2007 (in thousands of dollars)
Zillow price estimate from 2011 (in thousands of dollars)
Zillow price estimate from 2014 (in thousands of dollars)
SquareFeet group (<= 1500 sf or > 1500 sf)
Square footage of interior finished space (in thousands of sf)
Street name
House number on street
Walk friendliness (0-100 score, higher scores are better)
Location (1060 = Northampton or 1062 = Florence)
This dataset comprises 104 homes in Northampton, MA that were sold in 2007. The authors measured the shortest distance from each home to a railtrail on streets and pathways with Google maps and recorded the Zillow.com estimate of each home’s price in 1998 and 2011. Additional attributes such as square footage, number of bedrooms and number of bathrooms are available from a realty database from 2007. We divide the houses into two groups based on distance to the trail (DistGroup).
From July 2015 JSE Datasets and Stories: “Rail Trails and Property Values: Is There an Association?”, Ella Hartenian, Smith College and Nicholas J. Horton, Amherst College. http://www.amstat.org/publications/jse/v23n2/horton.pdf
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(tidyr)
library(purrr)
library(readr)
library(factoextra)
library(ggfortify)
library(stats)
library(cluster)
library(psych)
library(devtools)
library(ggbiplot)
library(ggalt)
library(ggforce)
library(ggplot2)
input_data <- read_csv("RailsTrails.csv")
## Warning: Missing column names filled in: 'X1' [1]
input_data1 = input_data[,-1]
num.names <- input_data1 %>% select_if(is.numeric) %>% colnames()
ch.names <- input_data1 %>% select_if(is.character) %>% colnames()
dim(input_data1)
## [1] 104 30
str(input_data1)
## tibble [104 x 30] (S3: tbl_df/tbl/data.frame)
## $ HouseNum : num [1:104] 1 2 3 4 5 6 7 8 9 10 ...
## $ Acre : num [1:104] 0.28 0.29 0.36 0.26 0.31 0.31 0.08 0.11 0.31 0.27 ...
## $ AcreGroup : chr [1:104] "> 1/4 acre" "> 1/4 acre" "> 1/4 acre" "> 1/4 acre" ...
## $ Adj1998 : num [1:104] 148 135 257 232 272 ...
## $ Adj2007 : num [1:104] 234 261 401 305 299 ...
## $ Adj2011 : num [1:104] 192 207 348 257 237 ...
## $ BedGroup : chr [1:104] "3 beds" "3 beds" "3 beds" "3 beds" ...
## $ Bedrooms : num [1:104] 3 3 3 3 4 3 3 4 5 3 ...
## $ BikeScore : num [1:104] 35 44 66 61 53 36 97 95 38 30 ...
## $ Diff2014 : num [1:104] 62.4 69 82.1 44.6 -102.7 ...
## $ Distance : num [1:104] 2.4 1.97 0.0434 0.5547 0.5966 ...
## $ DistGroup : chr [1:104] "Farther Away" "Farther Away" "Closer" "Farther Away" ...
## $ GarageSpaces: num [1:104] 2 1 2 1 0 1 0 0 0 0 ...
## $ GarageGroup : chr [1:104] "yes" "yes" "yes" "yes" ...
## $ Latitude : num [1:104] 42.3 42.3 42.3 42.3 42.3 ...
## $ Longitude : num [1:104] -72.7 -72.7 -72.7 -72.7 -72.7 ...
## $ NumFullBaths: num [1:104] 1 1 2 1 1 1 1 1 2 1 ...
## $ NumHalfBaths: num [1:104] 0 0 1 1 0 1 0 1 0 0 ...
## $ NumRooms : num [1:104] 5 5 7 6 6 6 6 9 7 5 ...
## $ PctChange : num [1:104] 42 51 32 19.2 -37.8 ...
## $ Price1998 : num [1:104] 101.5 92.5 175.5 158.5 186 ...
## $ Price2007 : num [1:104] 204 228 349 266 260 ...
## $ Price2011 : num [1:104] 181 195 328 243 223 ...
## $ Price2014 : num [1:104] 211 204 339 276 169 ...
## $ SFGroup : chr [1:104] "<= 1500 sf" "<= 1500 sf" "> 1500 sf" "> 1500 sf" ...
## $ SquareFeet : num [1:104] 0.966 0.96 1.725 1.727 1.576 ...
## $ StreetName : chr [1:104] "Acrebrook Drive" "Autumn Dr" "Bridge Road" "Bridge Road" ...
## $ StreetNum : num [1:104] 406 57 31 200 395 ...
## $ WalkScore : num [1:104] 9 5 46 40 32 12 82 88 15 9 ...
## $ Zip : num [1:104] 1062 1062 1062 1060 1062 ...
summary(input_data1)
## HouseNum Acre AcreGroup Adj1998
## Min. : 1.00 Min. :0.0500 Length:104 Min. : 60.66
## 1st Qu.: 26.75 1st Qu.:0.1675 Class :character 1st Qu.:167.00
## Median : 52.50 Median :0.2500 Mode :character Median :200.62
## Mean : 52.50 Mean :0.2574 Mean :208.57
## 3rd Qu.: 78.25 3rd Qu.:0.3300 3rd Qu.:228.39
## Max. :104.00 Max. :0.5600 Max. :470.67
## Adj2007 Adj2011 BedGroup Bedrooms
## Min. :162.6 Min. :141.7 Length:104 Min. :1.00
## 1st Qu.:260.6 1st Qu.:215.8 Class :character 1st Qu.:3.00
## Median :303.6 Median :258.9 Mode :character Median :3.00
## Mean :327.6 Mean :284.5 Mean :3.25
## 3rd Qu.:349.5 3rd Qu.:325.1 3rd Qu.:4.00
## Max. :798.6 Max. :698.5 Max. :6.00
## BikeScore Diff2014 Distance DistGroup
## Min. :18.00 Min. :-199.87 Min. :0.03883 Length:104
## 1st Qu.:36.00 1st Qu.: 44.33 1st Qu.:0.32879 Class :character
## Median :54.50 Median : 71.39 Median :0.76042 Mode :character
## Mean :57.28 Mean : 84.53 Mean :1.11432
## 3rd Qu.:77.25 3rd Qu.: 106.87 3rd Qu.:1.89579
## Max. :97.00 Max. : 497.82 Max. :3.97678
## GarageSpaces GarageGroup Latitude Longitude
## Min. :0.0000 Length:104 Min. :42.30 Min. :-72.73
## 1st Qu.:0.0000 Class :character 1st Qu.:42.32 1st Qu.:-72.68
## Median :1.0000 Mode :character Median :42.32 Median :-72.66
## Mean :0.7596 Mean :42.33 Mean :-72.66
## 3rd Qu.:1.0000 3rd Qu.:42.33 3rd Qu.:-72.64
## Max. :4.0000 Max. :42.35 Max. :-72.61
## NumFullBaths NumHalfBaths NumRooms PctChange
## Min. :1.000 Min. :0.0000 Min. : 4.000 Min. :-46.75
## 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.: 5.000 1st Qu.: 26.54
## Median :1.000 Median :0.0000 Median : 6.500 Median : 37.61
## Mean :1.452 Mean :0.2212 Mean : 6.615 Mean : 42.20
## 3rd Qu.:2.000 3rd Qu.:0.0000 3rd Qu.: 7.250 3rd Qu.: 51.17
## Max. :4.000 Max. :1.0000 Max. :14.000 Max. :130.49
## Price1998 Price2007 Price2011 Price2014
## Min. : 41.5 Min. :141.5 Min. :133.8 Min. :132.1
## 1st Qu.:114.2 1st Qu.:226.8 1st Qu.:203.8 1st Qu.:212.9
## Median :137.2 Median :264.2 Median :244.4 Median :272.9
## Mean :142.7 Mean :285.1 Mean :268.6 Mean :293.1
## 3rd Qu.:156.2 3rd Qu.:304.1 3rd Qu.:306.9 3rd Qu.:334.2
## Max. :322.0 Max. :695.0 Max. :659.5 Max. :879.3
## SFGroup SquareFeet StreetName StreetNum
## Length:104 Min. :0.524 Length:104 Min. : 1.0
## Class :character 1st Qu.:1.206 Class :character 1st Qu.: 27.0
## Mode :character Median :1.516 Mode :character Median : 63.5
## Mean :1.566 Mean : 137.8
## 3rd Qu.:1.832 3rd Qu.: 155.0
## Max. :4.030 Max. :1086.0
## WalkScore Zip
## Min. : 2.00 Min. :1060
## 1st Qu.:14.75 1st Qu.:1060
## Median :36.00 Median :1062
## Mean :38.88 Mean :1061
## 3rd Qu.:60.75 3rd Qu.:1062
## Max. :94.00 Max. :1062
glimpse(input_data1)
## Rows: 104
## Columns: 30
## $ HouseNum <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,...
## $ Acre <dbl> 0.28, 0.29, 0.36, 0.26, 0.31, 0.31, 0.08, 0.11, 0.31, ...
## $ AcreGroup <chr> "> 1/4 acre", "> 1/4 acre", "> 1/4 acre", "> 1/4 acre"...
## $ Adj1998 <dbl> 148.3625, 135.2072, 256.5283, 231.6795, 271.8762, 192....
## $ Adj2007 <dbl> 233.8418, 261.4203, 401.0359, 305.0861, 298.7660, 275....
## $ Adj2011 <dbl> 191.8211, 206.9677, 347.9472, 257.4915, 236.6253, 243....
## $ BedGroup <chr> "3 beds", "3 beds", "3 beds", "3 beds", "4+ beds", "3 ...
## $ Bedrooms <dbl> 3, 3, 3, 3, 4, 3, 3, 4, 5, 3, 3, 3, 4, 3, 4, 3, 5, 3, ...
## $ BikeScore <dbl> 35, 44, 66, 61, 53, 36, 97, 95, 38, 30, 46, 38, 79, 85...
## $ Diff2014 <dbl> 62.36645, 68.96375, 82.13365, 44.57055, -102.70320, 18...
## $ Distance <dbl> 2.40000000, 1.97000000, 0.04337121, 0.55473485, 0.5965...
## $ DistGroup <chr> "Farther Away", "Farther Away", "Closer", "Farther Awa...
## $ GarageSpaces <dbl> 2, 1, 2, 1, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, ...
## $ GarageGroup <chr> "yes", "yes", "yes", "yes", "no", "yes", "no", "no", "...
## $ Latitude <dbl> 42.31533, 42.29856, 42.34379, 42.34446, 42.34253, 42.3...
## $ Longitude <dbl> -72.69397, -72.67474, -72.68023, -72.67221, -72.66437,...
## $ NumFullBaths <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, ...
## $ NumHalfBaths <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, ...
## $ NumRooms <dbl> 5, 5, 7, 6, 6, 6, 6, 9, 7, 5, 5, 7, 8, 6, 9, 6, 7, 5, ...
## $ PctChange <dbl> 42.036518, 51.005956, 32.017377, 19.238025, -37.775723...
## $ Price1998 <dbl> 101.5, 92.5, 175.5, 158.5, 186.0, 132.0, 117.0, 146.0,...
## $ Price2007 <dbl> 203.5, 227.5, 349.0, 265.5, 260.0, 240.0, 264.5, 331.5...
## $ Price2011 <dbl> 181.1, 195.4, 328.5, 243.1, 223.4, 229.7, 281.3, 357.6...
## $ Price2014 <dbl> 210.729, 204.171, 338.662, 276.250, 169.173, 211.487, ...
## $ SFGroup <chr> "<= 1500 sf", "<= 1500 sf", "> 1500 sf", "> 1500 sf", ...
## $ SquareFeet <dbl> 0.966, 0.960, 1.725, 1.727, 1.576, 1.320, 1.202, 2.136...
## $ StreetName <chr> "Acrebrook Drive", "Autumn Dr", "Bridge Road", "Bridge...
## $ StreetNum <dbl> 406, 57, 31, 200, 395, 23, 18, 23, 497, 1086, 14, 21, ...
## $ WalkScore <dbl> 9, 5, 46, 40, 32, 12, 82, 88, 15, 9, 35, 20, 68, 65, 6...
## $ Zip <dbl> 1062, 1062, 1062, 1060, 1062, 1062, 1060, 1060, 1062, ...
head(input_data1)
## # A tibble: 6 x 30
## HouseNum Acre AcreGroup Adj1998 Adj2007 Adj2011 BedGroup Bedrooms BikeScore
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 1 0.28 > 1/4 ac~ 148. 234. 192. 3 beds 3 35
## 2 2 0.290 > 1/4 ac~ 135. 261. 207. 3 beds 3 44
## 3 3 0.36 > 1/4 ac~ 257. 401. 348. 3 beds 3 66
## 4 4 0.26 > 1/4 ac~ 232. 305. 257. 3 beds 3 61
## 5 5 0.31 > 1/4 ac~ 272. 299. 237. 4+ beds 4 53
## 6 6 0.31 > 1/4 ac~ 193. 276. 243. 3 beds 3 36
## # ... with 21 more variables: Diff2014 <dbl>, Distance <dbl>, DistGroup <chr>,
## # GarageSpaces <dbl>, GarageGroup <chr>, Latitude <dbl>, Longitude <dbl>,
## # NumFullBaths <dbl>, NumHalfBaths <dbl>, NumRooms <dbl>, PctChange <dbl>,
## # Price1998 <dbl>, Price2007 <dbl>, Price2011 <dbl>, Price2014 <dbl>,
## # SFGroup <chr>, SquareFeet <dbl>, StreetName <chr>, StreetNum <dbl>,
## # WalkScore <dbl>, Zip <dbl>
tail(input_data1)
## # A tibble: 6 x 30
## HouseNum Acre AcreGroup Adj1998 Adj2007 Adj2011 BedGroup Bedrooms BikeScore
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 99 0.19 <= 1/4 a~ 428. 514. 406. 3 beds 3 90
## 2 100 0.13 <= 1/4 a~ 219. 324. 336. 3 beds 3 73
## 3 101 0.46 > 1/4 ac~ 324. 517. 506. 4+ beds 4 60
## 4 102 0.4 > 1/4 ac~ 222. 329. 272. 4+ beds 4 78
## 5 103 0.2 <= 1/4 a~ 219. 330. 265. 4+ beds 4 80
## 6 104 0.31 > 1/4 ac~ 175. 273. 237. 1-2 beds 2 47
## # ... with 21 more variables: Diff2014 <dbl>, Distance <dbl>, DistGroup <chr>,
## # GarageSpaces <dbl>, GarageGroup <chr>, Latitude <dbl>, Longitude <dbl>,
## # NumFullBaths <dbl>, NumHalfBaths <dbl>, NumRooms <dbl>, PctChange <dbl>,
## # Price1998 <dbl>, Price2007 <dbl>, Price2011 <dbl>, Price2014 <dbl>,
## # SFGroup <chr>, SquareFeet <dbl>, StreetName <chr>, StreetNum <dbl>,
## # WalkScore <dbl>, Zip <dbl>
sapply(input_data1,mode)
## HouseNum Acre AcreGroup Adj1998 Adj2007 Adj2011
## "numeric" "numeric" "character" "numeric" "numeric" "numeric"
## BedGroup Bedrooms BikeScore Diff2014 Distance DistGroup
## "character" "numeric" "numeric" "numeric" "numeric" "character"
## GarageSpaces GarageGroup Latitude Longitude NumFullBaths NumHalfBaths
## "numeric" "character" "numeric" "numeric" "numeric" "numeric"
## NumRooms PctChange Price1998 Price2007 Price2011 Price2014
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## SFGroup SquareFeet StreetName StreetNum WalkScore Zip
## "character" "numeric" "character" "numeric" "numeric" "numeric"
lapply(input_data1[,num.names],mean)
## $HouseNum
## [1] 52.5
##
## $Acre
## [1] 0.2574038
##
## $Adj1998
## [1] 208.5663
##
## $Adj2007
## [1] 327.5543
##
## $Adj2011
## [1] 284.5266
##
## $Bedrooms
## [1] 3.25
##
## $BikeScore
## [1] 57.27885
##
## $Diff2014
## [1] 84.52769
##
## $Distance
## [1] 1.114324
##
## $GarageSpaces
## [1] 0.7596154
##
## $Latitude
## [1] 42.32622
##
## $Longitude
## [1] -72.6623
##
## $NumFullBaths
## [1] 1.451923
##
## $NumHalfBaths
## [1] 0.2211538
##
## $NumRooms
## [1] 6.615385
##
## $PctChange
## [1] 42.2015
##
## $Price1998
## [1] 142.6875
##
## $Price2007
## [1] 285.0529
##
## $Price2011
## [1] 268.624
##
## $Price2014
## [1] 293.094
##
## $SquareFeet
## [1] 1.566404
##
## $StreetNum
## [1] 137.8365
##
## $WalkScore
## [1] 38.875
##
## $Zip
## [1] 1061.173
lapply(input_data1[,num.names],median)
## $HouseNum
## [1] 52.5
##
## $Acre
## [1] 0.25
##
## $Adj1998
## [1] 200.6183
##
## $Adj2007
## [1] 303.6497
##
## $Adj2011
## [1] 258.8685
##
## $Bedrooms
## [1] 3
##
## $BikeScore
## [1] 54.5
##
## $Diff2014
## [1] 71.38875
##
## $Distance
## [1] 0.7604167
##
## $GarageSpaces
## [1] 1
##
## $Latitude
## [1] 42.32428
##
## $Longitude
## [1] -72.66413
##
## $NumFullBaths
## [1] 1
##
## $NumHalfBaths
## [1] 0
##
## $NumRooms
## [1] 6.5
##
## $PctChange
## [1] 37.60517
##
## $Price1998
## [1] 137.25
##
## $Price2007
## [1] 264.25
##
## $Price2011
## [1] 244.4
##
## $Price2014
## [1] 272.92
##
## $SquareFeet
## [1] 1.5155
##
## $StreetNum
## [1] 63.5
##
## $WalkScore
## [1] 36
##
## $Zip
## [1] 1062
lapply(input_data1[,num.names],min)
## $HouseNum
## [1] 1
##
## $Acre
## [1] 0.05
##
## $Adj1998
## [1] 60.66055
##
## $Adj2007
## [1] 162.5976
##
## $Adj2011
## [1] 141.721
##
## $Bedrooms
## [1] 1
##
## $BikeScore
## [1] 18
##
## $Diff2014
## [1] -199.8663
##
## $Distance
## [1] 0.03882576
##
## $GarageSpaces
## [1] 0
##
## $Latitude
## [1] 42.29856
##
## $Longitude
## [1] -72.7288
##
## $NumFullBaths
## [1] 1
##
## $NumHalfBaths
## [1] 0
##
## $NumRooms
## [1] 4
##
## $PctChange
## [1] -46.74717
##
## $Price1998
## [1] 41.5
##
## $Price2007
## [1] 141.5
##
## $Price2011
## [1] 133.8
##
## $Price2014
## [1] 132.135
##
## $SquareFeet
## [1] 0.524
##
## $StreetNum
## [1] 1
##
## $WalkScore
## [1] 2
##
## $Zip
## [1] 1060
lapply(input_data1[,num.names],max)
## $HouseNum
## [1] 104
##
## $Acre
## [1] 0.56
##
## $Adj1998
## [1] 470.6674
##
## $Adj2007
## [1] 798.6245
##
## $Adj2011
## [1] 698.5424
##
## $Bedrooms
## [1] 6
##
## $BikeScore
## [1] 97
##
## $Diff2014
## [1] 497.8243
##
## $Distance
## [1] 3.97678
##
## $GarageSpaces
## [1] 4
##
## $Latitude
## [1] 42.35441
##
## $Longitude
## [1] -72.61442
##
## $NumFullBaths
## [1] 4
##
## $NumHalfBaths
## [1] 1
##
## $NumRooms
## [1] 14
##
## $PctChange
## [1] 130.49
##
## $Price1998
## [1] 322
##
## $Price2007
## [1] 695
##
## $Price2011
## [1] 659.5
##
## $Price2014
## [1] 879.328
##
## $SquareFeet
## [1] 4.03
##
## $StreetNum
## [1] 1086
##
## $WalkScore
## [1] 94
##
## $Zip
## [1] 1062
lapply(input_data1[,num.names],range)
## $HouseNum
## [1] 1 104
##
## $Acre
## [1] 0.05 0.56
##
## $Adj1998
## [1] 60.66055 470.66740
##
## $Adj2007
## [1] 162.5976 798.6245
##
## $Adj2011
## [1] 141.7210 698.5424
##
## $Bedrooms
## [1] 1 6
##
## $BikeScore
## [1] 18 97
##
## $Diff2014
## [1] -199.8663 497.8243
##
## $Distance
## [1] 0.03882576 3.97678030
##
## $GarageSpaces
## [1] 0 4
##
## $Latitude
## [1] 42.29856 42.35441
##
## $Longitude
## [1] -72.72880 -72.61442
##
## $NumFullBaths
## [1] 1 4
##
## $NumHalfBaths
## [1] 0 1
##
## $NumRooms
## [1] 4 14
##
## $PctChange
## [1] -46.74717 130.49003
##
## $Price1998
## [1] 41.5 322.0
##
## $Price2007
## [1] 141.5 695.0
##
## $Price2011
## [1] 133.8 659.5
##
## $Price2014
## [1] 132.135 879.328
##
## $SquareFeet
## [1] 0.524 4.030
##
## $StreetNum
## [1] 1 1086
##
## $WalkScore
## [1] 2 94
##
## $Zip
## [1] 1060 1062
lapply(input_data1[,num.names],var)
## $HouseNum
## [1] 910
##
## $Acre
## [1] 0.01478057
##
## $Adj1998
## [1] 4417.333
##
## $Adj2007
## [1] 11021.41
##
## $Adj2011
## [1] 8702.328
##
## $Bedrooms
## [1] 0.8300971
##
## $BikeScore
## [1] 514.106
##
## $Diff2014
## [1] 5862.958
##
## $Distance
## [1] 0.883221
##
## $GarageSpaces
## [1] 0.7474795
##
## $Latitude
## [1] 0.0001684309
##
## $Longitude
## [1] 0.00055954
##
## $NumFullBaths
## [1] 0.3860157
##
## $NumHalfBaths
## [1] 0.1739171
##
## $NumRooms
## [1] 2.782674
##
## $PctChange
## [1] 912.0346
##
## $Price1998
## [1] 2067.491
##
## $Price2007
## [1] 8346.83
##
## $Price2011
## [1] 7756.745
##
## $Price2014
## [1] 12296.57
##
## $SquareFeet
## [1] 0.3119305
##
## $StreetNum
## [1] 40786.72
##
## $WalkScore
## [1] 688.8483
##
## $Zip
## [1] 0.9794623
lapply(input_data1[,num.names],sd)
## $HouseNum
## [1] 30.16621
##
## $Acre
## [1] 0.1215754
##
## $Adj1998
## [1] 66.46302
##
## $Adj2007
## [1] 104.9829
##
## $Adj2011
## [1] 93.28627
##
## $Bedrooms
## [1] 0.9110966
##
## $BikeScore
## [1] 22.6739
##
## $Diff2014
## [1] 76.56996
##
## $Distance
## [1] 0.9397984
##
## $GarageSpaces
## [1] 0.8645689
##
## $Latitude
## [1] 0.01297809
##
## $Longitude
## [1] 0.0236546
##
## $NumFullBaths
## [1] 0.6213016
##
## $NumHalfBaths
## [1] 0.4170337
##
## $NumRooms
## [1] 1.668135
##
## $PctChange
## [1] 30.19991
##
## $Price1998
## [1] 45.46967
##
## $Price2007
## [1] 91.36099
##
## $Price2011
## [1] 88.07238
##
## $Price2014
## [1] 110.8899
##
## $SquareFeet
## [1] 0.5585074
##
## $StreetNum
## [1] 201.9572
##
## $WalkScore
## [1] 26.24592
##
## $Zip
## [1] 0.9896779
lapply(input_data1[,num.names],mad)
## $HouseNum
## [1] 38.5476
##
## $Acre
## [1] 0.118608
##
## $Adj1998
## [1] 48.21834
##
## $Adj2007
## [1] 67.2944
##
## $Adj2011
## [1] 68.54665
##
## $Bedrooms
## [1] 0.7413
##
## $BikeScore
## [1] 30.3933
##
## $Diff2014
## [1] 45.16967
##
## $Distance
## [1] 0.8901216
##
## $GarageSpaces
## [1] 1.4826
##
## $Latitude
## [1] 0.01319069
##
## $Longitude
## [1] 0.03200859
##
## $NumFullBaths
## [1] 0
##
## $NumHalfBaths
## [1] 0
##
## $NumRooms
## [1] 2.2239
##
## $PctChange
## [1] 17.86248
##
## $Price1998
## [1] 32.98785
##
## $Price2007
## [1] 58.5627
##
## $Price2011
## [1] 64.71549
##
## $Price2014
## [1] 88.94785
##
## $SquareFeet
## [1] 0.4655364
##
## $StreetNum
## [1] 66.717
##
## $WalkScore
## [1] 34.0998
##
## $Zip
## [1] 0
It appears that the variable values are on different scales.
boxplot(input_data1[,num.names])
input_data1[,num.names] %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
Useful for examinating the data values. By sorting the data, one can tell if there are missing or corrupted data values.
input_data1 <- input_data1[order(input_data1[,1]),]
These missing values could cause inaccuracies or errors when calculating data limits, central tendency, dispersion tendency, correlation, multicollinearity, p-values, z-scores, variance inflation factors, etc. Also, because cluster analysis involves the calculation of Euclidean distance it is important to remove these data records with missing values. A replacement of 0 for any of these missing values will introduce inaccuracy to the cluster analysis result.
# input_data1 <- input_data1%>% mutate_all(funs(replace_na(.,0)))
input_data1 <- na.omit(input_data1)
This is necessary for cluster analysis such as K-Means to correctly calculate the Euclidean distances between data points.
# m <- apply(input_data1[,num.names],2,mean)
# s <- apply(input_data1[,num.names],2,sd)
# input_data1_std <- as.data.frame(scale(input_data1[,num.names],m,s))
input_data1_std <- as.data.frame(lapply(input_data1, function(x) if(is.numeric(x)){
(x-mean(x))/sd(x)
} else x))
It appears that now all variable values are on the same scale.
boxplot(input_data1_std[,num.names])
input_data1_std[,num.names] %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
dim(input_data1_std)
## [1] 104 30
str(input_data1_std)
## 'data.frame': 104 obs. of 30 variables:
## $ HouseNum : num -1.71 -1.67 -1.64 -1.61 -1.57 ...
## $ Acre : num 0.1859 0.2681 0.8439 0.0214 0.4326 ...
## $ AcreGroup : Factor w/ 2 levels "<= 1/4 acre",..: 2 2 2 2 2 2 1 1 2 2 ...
## $ Adj1998 : num -0.906 -1.104 0.722 0.348 0.953 ...
## $ Adj2007 : num -0.893 -0.63 0.7 -0.214 -0.274 ...
## $ Adj2011 : num -0.994 -0.831 0.68 -0.29 -0.513 ...
## $ BedGroup : Factor w/ 3 levels "1-2 beds","3 beds",..: 2 2 2 2 3 2 2 3 3 2 ...
## $ Bedrooms : num -0.274 -0.274 -0.274 -0.274 0.823 ...
## $ BikeScore : num -0.983 -0.586 0.385 0.164 -0.189 ...
## $ Diff2014 : num -0.2894 -0.2033 -0.0313 -0.5218 -2.4452 ...
## $ Distance : num 1.368 0.91 -1.14 -0.595 -0.551 ...
## $ DistGroup : Factor w/ 2 levels "Closer","Farther Away": 2 2 1 2 2 2 1 1 2 2 ...
## $ GarageSpaces: num 1.435 0.278 1.435 0.278 -0.879 ...
## $ GarageGroup : Factor w/ 2 levels "no","yes": 2 2 2 2 1 2 1 1 1 1 ...
## $ Latitude : num -0.839 -2.132 1.353 1.405 1.256 ...
## $ Longitude : num -1.3389 -0.526 -0.7582 -0.4188 -0.0875 ...
## $ NumFullBaths: num -0.727 -0.727 0.882 -0.727 -0.727 ...
## $ NumHalfBaths: num -0.53 -0.53 1.87 1.87 -0.53 ...
## $ NumRooms : num -0.968 -0.968 0.231 -0.369 -0.369 ...
## $ PctChange : num -0.00546 0.29154 -0.33722 -0.76038 -2.64826 ...
## $ Price1998 : num -0.906 -1.104 0.722 0.348 0.953 ...
## $ Price2007 : num -0.893 -0.63 0.7 -0.214 -0.274 ...
## $ Price2011 : num -0.994 -0.831 0.68 -0.29 -0.513 ...
## $ Price2014 : num -0.743 -0.802 0.411 -0.152 -1.118 ...
## $ SFGroup : Factor w/ 2 levels "<= 1500 sf","> 1500 sf": 1 1 2 2 2 1 1 2 2 1 ...
## $ SquareFeet : num -1.075 -1.0858 0.284 0.2875 0.0172 ...
## $ StreetName : Factor w/ 73 levels "Acrebrook Drive",..: 1 2 3 3 3 4 5 6 7 7 ...
## $ StreetNum : num 1.328 -0.4 -0.529 0.308 1.273 ...
## $ WalkScore : num -1.1383 -1.2907 0.2715 0.0429 -0.2619 ...
## $ Zip : num 0.836 0.836 0.836 -1.185 0.836 ...
summary(input_data1_std)
## HouseNum Acre AcreGroup Adj1998
## Min. :-1.7072 Min. :-1.7060 <= 1/4 acre:54 Min. :-2.2254
## 1st Qu.:-0.8536 1st Qu.:-0.7395 > 1/4 acre :50 1st Qu.:-0.6254
## Median : 0.0000 Median :-0.0609 Median :-0.1196
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.8536 3rd Qu.: 0.5971 3rd Qu.: 0.2983
## Max. : 1.7072 Max. : 2.4890 Max. : 3.9436
##
## Adj2007 Adj2011 BedGroup Bedrooms
## Min. :-1.5713 Min. :-1.5308 1-2 beds:16 Min. :-2.4696
## 1st Qu.:-0.6382 1st Qu.:-0.7366 3 beds :52 1st Qu.:-0.2744
## Median :-0.2277 Median :-0.2750 4+ beds :36 Median :-0.2744
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.2088 3rd Qu.: 0.4349 3rd Qu.: 0.8232
## Max. : 4.4871 Max. : 4.4381 Max. : 3.0183
##
## BikeScore Diff2014 Distance DistGroup
## Min. :-1.7323 Min. :-3.7142 Min. :-1.1444 Closer :40
## 1st Qu.:-0.9385 1st Qu.:-0.5249 1st Qu.:-0.8359 Farther Away:64
## Median :-0.1226 Median :-0.1716 Median :-0.3766
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.8808 3rd Qu.: 0.2918 3rd Qu.: 0.8315
## Max. : 1.7518 Max. : 5.3976 Max. : 3.0458
##
## GarageSpaces GarageGroup Latitude Longitude
## Min. :-0.8786 no :51 Min. :-2.1317 Min. :-2.8112
## 1st Qu.:-0.8786 yes:53 1st Qu.:-0.6669 1st Qu.:-0.7140
## Median : 0.2780 Median :-0.1498 Median :-0.0775
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.2780 3rd Qu.: 0.5816 3rd Qu.: 0.8384
## Max. : 3.7480 Max. : 2.1718 Max. : 2.0240
##
## NumFullBaths NumHalfBaths NumRooms PctChange
## Min. :-0.7274 Min. :-0.5303 Min. :-1.56785 Min. :-2.9453
## 1st Qu.:-0.7274 1st Qu.:-0.5303 1st Qu.:-0.96838 1st Qu.:-0.5185
## Median :-0.7274 Median :-0.5303 Median :-0.06917 Median :-0.1522
## Mean : 0.0000 Mean : 0.0000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.8821 3rd Qu.:-0.5303 3rd Qu.: 0.38043 3rd Qu.: 0.2969
## Max. : 4.1012 Max. : 1.8676 Max. : 4.42687 Max. : 2.9235
##
## Price1998 Price2007 Price2011 Price2014
## Min. :-2.2254 Min. :-1.5713 Min. :-1.5308 Min. :-1.4515
## 1st Qu.:-0.6254 1st Qu.:-0.6382 1st Qu.:-0.7366 1st Qu.:-0.7228
## Median :-0.1196 Median :-0.2277 Median :-0.2750 Median :-0.1819
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.2983 3rd Qu.: 0.2088 3rd Qu.: 0.4349 3rd Qu.: 0.3704
## Max. : 3.9436 Max. : 4.4871 Max. : 4.4381 Max. : 5.2866
##
## SFGroup SquareFeet StreetName StreetNum
## <= 1500 sf:51 Min. :-1.86641 Laurel Park : 8 Min. :-0.67755
## > 1500 sf :53 1st Qu.:-0.64440 Ryan Road : 6 1st Qu.:-0.54881
## Median :-0.09114 Bridge Road : 3 Median :-0.36808
## Mean : 0.00000 Longview Drive : 3 Mean : 0.00000
## 3rd Qu.: 0.47510 North Maple Street: 3 3rd Qu.: 0.08499
## Max. : 4.41104 Burts Pit Rd : 2 Max. : 4.69487
## (Other) :79
## WalkScore Zip
## Min. :-1.4050 Min. :-1.1853
## 1st Qu.:-0.9192 1st Qu.:-1.1853
## Median :-0.1095 Median : 0.8355
## Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.8335 3rd Qu.: 0.8355
## Max. : 2.1003 Max. : 0.8355
##
glimpse(input_data1_std)
## Rows: 104
## Columns: 30
## $ HouseNum <dbl> -1.7072084, -1.6740587, -1.6409090, -1.6077593, -1.574...
## $ Acre <dbl> 0.18586126, 0.26811476, 0.84388923, 0.02135427, 0.4326...
## $ AcreGroup <fct> > 1/4 acre, > 1/4 acre, > 1/4 acre, > 1/4 acre, > 1/4 ...
## $ Adj1998 <dbl> -0.90582353, -1.10375765, 0.72163483, 0.34775926, 0.95...
## $ Adj2007 <dbl> -0.89264454, -0.62995035, 0.69993898, -0.21401788, -0....
## $ Adj2011 <dbl> -0.99377393, -0.83140748, 0.67984945, -0.28980751, -0....
## $ BedGroup <fct> 3 beds, 3 beds, 3 beds, 3 beds, 4+ beds, 3 beds, 3 bed...
## $ Bedrooms <dbl> -0.2743946, -0.2743946, -0.2743946, -0.2743946, 0.8231...
## $ BikeScore <dbl> -0.9825765, -0.5856444, 0.3846340, 0.1641161, -0.18871...
## $ Diff2014 <dbl> -0.28942475, -0.20326433, -0.03126606, -0.52183837, -2...
## $ Distance <dbl> 1.3680343, 0.9104893, -1.1395555, -0.5954349, -0.55089...
## $ DistGroup <fct> Farther Away, Farther Away, Closer, Farther Away, Fart...
## $ GarageSpaces <dbl> 1.4346856, 0.2780398, 1.4346856, 0.2780398, -0.8786059...
## $ GarageGroup <fct> yes, yes, yes, yes, no, yes, no, no, no, no, no, yes, ...
## $ Latitude <dbl> -0.839007667, -2.131724363, 1.353455876, 1.405389538, ...
## $ Longitude <dbl> -1.33892696, -0.52601946, -0.75815192, -0.41880986, -0...
## $ NumFullBaths <dbl> -0.7273812, -0.7273812, 0.8821431, -0.7273812, -0.7273...
## $ NumHalfBaths <dbl> -0.5303021, -0.5303021, 1.8675857, 1.8675857, -0.53030...
## $ NumRooms <dbl> -0.9683778, -0.9683778, 0.2305661, -0.3689058, -0.3689...
## $ PctChange <dbl> -0.005463079, 0.291539045, -0.337223680, -0.760382299,...
## $ Price1998 <dbl> -0.90582353, -1.10375765, 0.72163483, 0.34775926, 0.95...
## $ Price2007 <dbl> -0.89264454, -0.62995035, 0.69993898, -0.21401788, -0....
## $ Price2011 <dbl> -0.99377393, -0.83140748, 0.67984945, -0.28980751, -0....
## $ Price2014 <dbl> -0.74276372, -0.80190345, 0.41092996, -0.15189847, -1....
## $ SFGroup <fct> <= 1500 sf, <= 1500 sf, > 1500 sf, > 1500 sf, > 1500 s...
## $ SquareFeet <dbl> -1.07501499, -1.08575791, 0.28396428, 0.28754525, 0.01...
## $ StreetName <fct> Acrebrook Drive, Autumn Dr, Bridge Road, Bridge Road, ...
## $ StreetNum <dbl> 1.327823067, -0.400265643, -0.529005777, 0.307805089, ...
## $ WalkScore <dbl> -1.13827217, -1.29067681, 0.27147077, 0.04286381, -0.2...
## $ Zip <dbl> 0.8355477, 0.8355477, 0.8355477, -1.1853119, 0.8355477...
head(input_data1_std)
## HouseNum Acre AcreGroup Adj1998 Adj2007 Adj2011 BedGroup
## 1 -1.707208 0.18586126 > 1/4 acre -0.9058235 -0.8926445 -0.9937739 3 beds
## 2 -1.674059 0.26811476 > 1/4 acre -1.1037577 -0.6299503 -0.8314075 3 beds
## 3 -1.640909 0.84388923 > 1/4 acre 0.7216348 0.6999390 0.6798494 3 beds
## 4 -1.607759 0.02135427 > 1/4 acre 0.3477593 -0.2140179 -0.2898075 3 beds
## 5 -1.574610 0.43262175 > 1/4 acre 0.9525580 -0.2742186 -0.5134872 4+ beds
## 6 -1.541460 0.43262175 > 1/4 acre -0.2350468 -0.4931305 -0.4419551 3 beds
## Bedrooms BikeScore Diff2014 Distance DistGroup GarageSpaces
## 1 -0.2743946 -0.9825765 -0.28942475 1.3680343 Farther Away 1.4346856
## 2 -0.2743946 -0.5856444 -0.20326433 0.9104893 Farther Away 0.2780398
## 3 -0.2743946 0.3846340 -0.03126606 -1.1395555 Closer 1.4346856
## 4 -0.2743946 0.1641161 -0.52183837 -0.5954349 Farther Away 0.2780398
## 5 0.8231838 -0.1887124 -2.44522656 -0.5508976 Farther Away -0.8786059
## 6 -0.2743946 -0.9384729 -0.86176216 0.8147241 Farther Away 0.2780398
## GarageGroup Latitude Longitude NumFullBaths NumHalfBaths NumRooms
## 1 yes -0.8390077 -1.33892696 -0.7273812 -0.5303021 -0.9683778
## 2 yes -2.1317244 -0.52601946 -0.7273812 -0.5303021 -0.9683778
## 3 yes 1.3534559 -0.75815192 0.8821431 1.8675857 0.2305661
## 4 yes 1.4053895 -0.41880986 -0.7273812 1.8675857 -0.3689058
## 5 no 1.2563692 -0.08754234 -0.7273812 -0.5303021 -0.3689058
## 6 yes -0.5767966 -1.21176353 -0.7273812 1.8675857 -0.3689058
## PctChange Price1998 Price2007 Price2011 Price2014 SFGroup
## 1 -0.005463079 -0.9058235 -0.8926445 -0.9937739 -0.7427637 <= 1500 sf
## 2 0.291539045 -1.1037577 -0.6299503 -0.8314075 -0.8019034 <= 1500 sf
## 3 -0.337223680 0.7216348 0.6999390 0.6798494 0.4109300 > 1500 sf
## 4 -0.760382299 0.3477593 -0.2140179 -0.2898075 -0.1518985 > 1500 sf
## 5 -2.648260307 0.9525580 -0.2742186 -0.5134872 -1.1175137 > 1500 sf
## 6 -1.079180970 -0.2350468 -0.4931305 -0.4419551 -0.7359281 <= 1500 sf
## SquareFeet StreetName StreetNum WalkScore Zip
## 1 -1.07501499 Acrebrook Drive 1.3278231 -1.13827217 0.8355477
## 2 -1.08575791 Autumn Dr -0.4002656 -1.29067681 0.8355477
## 3 0.28396428 Bridge Road -0.5290058 0.27147077 0.8355477
## 4 0.28754525 Bridge Road 0.3078051 0.04286381 -1.1853119
## 5 0.01718178 Bridge Road 1.2733561 -0.26194548 0.8355477
## 6 -0.44118276 Brierwood Drive -0.5686181 -1.02396869 0.8355477
tail(input_data1_std)
## HouseNum Acre AcreGroup Adj1998 Adj2007 Adj2011 BedGroup
## 99 1.541460 -0.5544202 <= 1/4 acre 3.2947784 1.77260692 1.3066066 3 beds
## 100 1.574610 -1.0479412 <= 1/4 acre 0.1498251 -0.03341563 0.5492750 3 beds
## 101 1.607759 1.6664242 > 1/4 acre 1.7332981 1.79997090 2.3784523 4+ beds
## 102 1.640909 1.1729032 > 1/4 acre 0.2048068 0.01583953 -0.1331182 4+ beds
## 103 1.674059 -0.4721667 <= 1/4 acre 0.1498251 0.02678512 -0.2125983 4+ beds
## 104 1.707208 0.4326217 > 1/4 acre -0.4989589 -0.52049444 -0.5089454 1-2 beds
## Bedrooms BikeScore Diff2014 Distance DistGroup GarageSpaces
## 99 -0.2743946 1.4431195 -0.57743197 -0.46766734 Farther Away -0.8786059
## 100 -0.2743946 0.6933589 -0.01452059 -0.17424528 Farther Away -0.8786059
## 101 0.8231838 0.1200126 1.65300811 -0.37254700 Farther Away 1.4346856
## 102 0.8231838 0.9138767 0.32824766 -1.05249620 Closer -0.8786059
## 103 0.8231838 1.0020839 0.23185542 -0.04688075 Farther Away -0.8786059
## 104 -1.3719730 -0.4533337 -1.08958781 -0.41607664 Farther Away -0.8786059
## GarageGroup Latitude Longitude NumFullBaths NumHalfBaths NumRooms
## 99 no -0.24084592 1.4542721 0.8821431 1.8675857 1.4295100
## 100 no -0.90488791 1.7001439 0.8821431 -0.5303021 0.8300381
## 101 yes -0.26234368 0.5264609 0.8821431 1.8675857 1.4295100
## 102 no 1.05556932 -0.4909734 0.8821431 -0.5303021 0.8300381
## 103 no -0.64491138 1.6585030 -0.7273812 -0.5303021 0.8300381
## 104 no 0.05904402 0.1423073 -0.7273812 -0.5303021 -0.9683778
## PctChange Price1998 Price2007 Price2011 Price2014 SFGroup
## 99 -1.0851829 3.2947784 1.77260692 1.3066066 1.57604036 > 1500 sf
## 100 -0.1334153 0.1498251 -0.03341563 0.5492750 0.07977272 > 1500 sf
## 101 0.7615691 1.7332981 1.79997090 2.3784523 2.18027922 > 1500 sf
## 102 0.2369527 0.2048068 0.01583953 -0.1331182 0.34940949 > 1500 sf
## 103 0.1524436 0.1498251 0.02678512 -0.2125983 0.24989639 > 1500 sf
## 104 -1.3766769 -0.4989589 -0.52049444 -0.5089454 -1.05142116 <= 1500 sf
## SquareFeet StreetName StreetNum WalkScore Zip
## 99 0.9589777 Union Street -0.4200718 1.90982067 -1.1853119
## 100 0.5238895 Valley Street -0.4794903 0.91919050 -1.1853119
## 101 1.7217249 Vernon Street -0.4745388 0.23336961 -1.1853119
## 102 0.7047286 Warren Street -0.5983274 0.99539282 0.8355477
## 103 0.6707094 Williams Street -0.4943450 1.56691023 -1.1853119
## 104 -0.6614126 Winslow Ave -0.6775521 -0.03333852 0.8355477
sapply(input_data1_std,mode)
## HouseNum Acre AcreGroup Adj1998 Adj2007 Adj2011
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## BedGroup Bedrooms BikeScore Diff2014 Distance DistGroup
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## GarageSpaces GarageGroup Latitude Longitude NumFullBaths NumHalfBaths
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## NumRooms PctChange Price1998 Price2007 Price2011 Price2014
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## SFGroup SquareFeet StreetName StreetNum WalkScore Zip
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
lapply(input_data1_std[,num.names],mean)
## $HouseNum
## [1] 0
##
## $Acre
## [1] 9.22656e-17
##
## $Adj1998
## [1] 1.210022e-16
##
## $Adj2007
## [1] 2.232273e-16
##
## $Adj2011
## [1] 9.771841e-17
##
## $Bedrooms
## [1] 1.810305e-17
##
## $BikeScore
## [1] 2.091572e-17
##
## $Diff2014
## [1] 4.066279e-17
##
## $Distance
## [1] -1.13321e-16
##
## $GarageSpaces
## [1] 7.512896e-17
##
## $Latitude
## [1] -1.052803e-13
##
## $Longitude
## [1] -2.715005e-13
##
## $NumFullBaths
## [1] 9.598525e-17
##
## $NumHalfBaths
## [1] -1.70814e-17
##
## $NumRooms
## [1] 2.433575e-16
##
## $PctChange
## [1] -1.196469e-16
##
## $Price1998
## [1] -4.682138e-18
##
## $Price2007
## [1] -2.813041e-16
##
## $Price2011
## [1] -1.470246e-16
##
## $Price2014
## [1] 9.333938e-17
##
## $SquareFeet
## [1] 1.158512e-16
##
## $StreetNum
## [1] 4.123617e-17
##
## $WalkScore
## [1] 2.891917e-17
##
## $Zip
## [1] 1.761201e-14
lapply(input_data1_std[,num.names],median)
## $HouseNum
## [1] 0
##
## $Acre
## [1] -0.06089922
##
## $Adj1998
## [1] -0.1195852
##
## $Adj2007
## [1] -0.2276999
##
## $Adj2011
## [1] -0.2750469
##
## $Bedrooms
## [1] -0.2743946
##
## $BikeScore
## [1] -0.122557
##
## $Diff2014
## [1] -0.1715939
##
## $Distance
## [1] -0.3765775
##
## $GarageSpaces
## [1] 0.2780398
##
## $Latitude
## [1] -0.1498079
##
## $Longitude
## [1] -0.07750201
##
## $NumFullBaths
## [1] -0.7273812
##
## $NumHalfBaths
## [1] -0.5303021
##
## $NumRooms
## [1] -0.06916984
##
## $PctChange
## [1] -0.1521968
##
## $Price1998
## [1] -0.1195852
##
## $Price2007
## [1] -0.2276999
##
## $Price2011
## [1] -0.2750469
##
## $Price2014
## [1] -0.1819283
##
## $SquareFeet
## [1] -0.09114265
##
## $StreetNum
## [1] -0.3680806
##
## $WalkScore
## [1] -0.1095408
##
## $Zip
## [1] 0.8355477
lapply(input_data1_std[,num.names],min)
## $HouseNum
## [1] -1.707208
##
## $Acre
## [1] -1.705969
##
## $Adj1998
## [1] -2.225384
##
## $Adj2007
## [1] -1.571271
##
## $Adj2011
## [1] -1.530832
##
## $Bedrooms
## [1] -2.469551
##
## $BikeScore
## [1] -1.732337
##
## $Diff2014
## [1] -3.714171
##
## $Distance
## [1] -1.144392
##
## $GarageSpaces
## [1] -0.8786059
##
## $Latitude
## [1] -2.131724
##
## $Longitude
## [1] -2.811241
##
## $NumFullBaths
## [1] -0.7273812
##
## $NumHalfBaths
## [1] -0.5303021
##
## $NumRooms
## [1] -1.56785
##
## $PctChange
## [1] -2.945329
##
## $Price1998
## [1] -2.225384
##
## $Price2007
## [1] -1.571271
##
## $Price2011
## [1] -1.530832
##
## $Price2014
## [1] -1.451521
##
## $SquareFeet
## [1] -1.86641
##
## $StreetNum
## [1] -0.6775521
##
## $WalkScore
## [1] -1.40498
##
## $Zip
## [1] -1.185312
lapply(input_data1_std[,num.names],max)
## $HouseNum
## [1] 1.707208
##
## $Acre
## [1] 2.488959
##
## $Adj1998
## [1] 3.943563
##
## $Adj2007
## [1] 4.487114
##
## $Adj2011
## [1] 4.438122
##
## $Bedrooms
## [1] 3.018341
##
## $BikeScore
## [1] 1.751844
##
## $Diff2014
## [1] 5.397634
##
## $Distance
## [1] 3.04582
##
## $GarageSpaces
## [1] 3.747977
##
## $Latitude
## [1] 2.171758
##
## $Longitude
## [1] 2.023971
##
## $NumFullBaths
## [1] 4.101192
##
## $NumHalfBaths
## [1] 1.867586
##
## $NumRooms
## [1] 4.42687
##
## $PctChange
## [1] 2.92347
##
## $Price1998
## [1] 3.943563
##
## $Price2007
## [1] 4.487114
##
## $Price2011
## [1] 4.438122
##
## $Price2014
## [1] 5.28663
##
## $SquareFeet
## [1] 4.411036
##
## $StreetNum
## [1] 4.694873
##
## $WalkScore
## [1] 2.100326
##
## $Zip
## [1] 0.8355477
lapply(input_data1_std[,num.names],range)
## $HouseNum
## [1] -1.707208 1.707208
##
## $Acre
## [1] -1.705969 2.488959
##
## $Adj1998
## [1] -2.225384 3.943563
##
## $Adj2007
## [1] -1.571271 4.487114
##
## $Adj2011
## [1] -1.530832 4.438122
##
## $Bedrooms
## [1] -2.469551 3.018341
##
## $BikeScore
## [1] -1.732337 1.751844
##
## $Diff2014
## [1] -3.714171 5.397634
##
## $Distance
## [1] -1.144392 3.045820
##
## $GarageSpaces
## [1] -0.8786059 3.7479771
##
## $Latitude
## [1] -2.131724 2.171758
##
## $Longitude
## [1] -2.811241 2.023971
##
## $NumFullBaths
## [1] -0.7273812 4.1011916
##
## $NumHalfBaths
## [1] -0.5303021 1.8675857
##
## $NumRooms
## [1] -1.56785 4.42687
##
## $PctChange
## [1] -2.945329 2.923470
##
## $Price1998
## [1] -2.225384 3.943563
##
## $Price2007
## [1] -1.571271 4.487114
##
## $Price2011
## [1] -1.530832 4.438122
##
## $Price2014
## [1] -1.451521 5.286630
##
## $SquareFeet
## [1] -1.866410 4.411036
##
## $StreetNum
## [1] -0.6775521 4.6948727
##
## $WalkScore
## [1] -1.404980 2.100326
##
## $Zip
## [1] -1.1853119 0.8355477
lapply(input_data1_std[,num.names],var)
## $HouseNum
## [1] 1
##
## $Acre
## [1] 1
##
## $Adj1998
## [1] 1
##
## $Adj2007
## [1] 1
##
## $Adj2011
## [1] 1
##
## $Bedrooms
## [1] 1
##
## $BikeScore
## [1] 1
##
## $Diff2014
## [1] 1
##
## $Distance
## [1] 1
##
## $GarageSpaces
## [1] 1
##
## $Latitude
## [1] 1
##
## $Longitude
## [1] 1
##
## $NumFullBaths
## [1] 1
##
## $NumHalfBaths
## [1] 1
##
## $NumRooms
## [1] 1
##
## $PctChange
## [1] 1
##
## $Price1998
## [1] 1
##
## $Price2007
## [1] 1
##
## $Price2011
## [1] 1
##
## $Price2014
## [1] 1
##
## $SquareFeet
## [1] 1
##
## $StreetNum
## [1] 1
##
## $WalkScore
## [1] 1
##
## $Zip
## [1] 1
lapply(input_data1_std[,num.names],sd)
## $HouseNum
## [1] 1
##
## $Acre
## [1] 1
##
## $Adj1998
## [1] 1
##
## $Adj2007
## [1] 1
##
## $Adj2011
## [1] 1
##
## $Bedrooms
## [1] 1
##
## $BikeScore
## [1] 1
##
## $Diff2014
## [1] 1
##
## $Distance
## [1] 1
##
## $GarageSpaces
## [1] 1
##
## $Latitude
## [1] 1
##
## $Longitude
## [1] 1
##
## $NumFullBaths
## [1] 1
##
## $NumHalfBaths
## [1] 1
##
## $NumRooms
## [1] 1
##
## $PctChange
## [1] 1
##
## $Price1998
## [1] 1
##
## $Price2007
## [1] 1
##
## $Price2011
## [1] 1
##
## $Price2014
## [1] 1
##
## $SquareFeet
## [1] 1
##
## $StreetNum
## [1] 1
##
## $WalkScore
## [1] 1
##
## $Zip
## [1] 1
lapply(input_data1_std[,num.names],mad)
## $HouseNum
## [1] 1.27784
##
## $Acre
## [1] 0.9755923
##
## $Adj1998
## [1] 0.7254912
##
## $Adj2007
## [1] 0.6410034
##
## $Adj2011
## [1] 0.7347989
##
## $Bedrooms
## [1] 0.8136349
##
## $BikeScore
## [1] 1.340453
##
## $Diff2014
## [1] 0.5899137
##
## $Distance
## [1] 0.947141
##
## $GarageSpaces
## [1] 1.714843
##
## $Latitude
## [1] 1.016381
##
## $Longitude
## [1] 1.353166
##
## $NumFullBaths
## [1] 0
##
## $NumHalfBaths
## [1] 0
##
## $NumRooms
## [1] 1.333166
##
## $PctChange
## [1] 0.5914746
##
## $Price1998
## [1] 0.7254912
##
## $Price2007
## [1] 0.6410034
##
## $Price2011
## [1] 0.7347989
##
## $Price2014
## [1] 0.8021274
##
## $SquareFeet
## [1] 0.8335366
##
## $StreetNum
## [1] 0.3303521
##
## $WalkScore
## [1] 1.299242
##
## $Zip
## [1] 0
The principal component analysis is a technique to reduce the number of dimensions of a data set while keeping the integrity of the data intact. When performing cluster analysis on a large multidimensional data set with many data fields, it is necessary to combine each group of highly correlated data fields into a new data field. The new data field is called a principal component. This technique will not only make it possible to carry out the cluster analysis on large data sets, but it also helps to exclude useless data fields that do not contribute much to the variation in the data.
Out of the 24 numeric variables, except variable HouseNum, 23 variables have absolute correlation coefficient values larger than 0.5. As a result of the calculation of the principal components, each of the first two principal components can capture either 13.8% or 44% of the variance in the data. If both principal components are used, the total accumulative percentage of variance captured is about 57.8%. To capture about 99% of the variance in the data, a minimum of 16 principal components are required. This is a better option when compared to the original data dimension of 24 variables.
This is to check if it is neccessary to carry out principal component analysis.
Since out of the 24 numeric variables, except variable HouseNum, 23 variables have absolute correlation coefficient values larger than 0.5, then it is necessary to carry out the principal component analysis. It because with a significant cut-off point of 0.5, it is evidenced that there exist many highly correlated variables, and they should be combined to form principal components.
This is the accumulation of variance covered by each principal component.
For this analysis, as one could see from all the calculations, Cumulative Proportion Variance plot, and the Scree plot below, that the amount of variance captured by each principal component becomes less after each principal component. Also, one could see that the first 16 principal components are required to reach a Cumulative Proportion Variance of 99%. Therefore, any principal component beyond the 16th one could be excluded from the cluster analysis.
This is to validate the resulting principal component analysis.
For this analysis, as one could see that both the calculation and the plotting of the correlation between the principal component scores below show that all the principal components are independent of each other. If one examines the plot below one could see that about 99.9% of the correlation coefficient values are smaller than the significant cut-off point of 0.5.
This is a plot of the first two principal components. The purpose of this plot is to reveal three properties. The first property is the degree of correlation between variables. The second property is the direction of the correlation between variables. The last property is the group of data points that contribute to the correlation between the two particular variables.
If one carefully observes the Biplot, one would see that the group of variables Adj1988, Adj2007, Adj2011, Price1998, Price2007, Price2011, Price2014, SquareFeet, HouseNum, Bedrooms, and NumRooms are highly correlated with each other. Also, one could see that the direction of variable StreetNum is perpendicular to all these variables, which means that variable StreetNum has very weak correlations with all these variables.
pairs.panels(input_data1_std[,num.names],gap=0,bg=c("green","red","yellow","blue","pink","purple"),pch=21)
oldw <- getOption("warn")
options(warn = -1)
cor(input_data1_std[,num.names])
## HouseNum Acre Adj1998 Adj2007 Adj2011
## HouseNum 1.000000000 0.07336878 0.19248873 0.18032495 0.18416442
## Acre 0.073368784 1.00000000 0.10107537 0.00551488 -0.03391574
## Adj1998 0.192488733 0.10107537 1.00000000 0.86717492 0.78736808
## Adj2007 0.180324953 0.00551488 0.86717492 1.00000000 0.93829759
## Adj2011 0.184164416 -0.03391574 0.78736808 0.93829759 1.00000000
## Bedrooms 0.002119478 0.05675348 0.59939534 0.54576063 0.56610895
## BikeScore 0.192127736 -0.26596371 0.46062119 0.52293803 0.52643817
## Diff2014 0.154133178 -0.18395645 0.19809934 0.54403362 0.70485942
## Distance -0.124392026 0.17498754 -0.44455151 -0.48722784 -0.47000388
## GarageSpaces 0.213116923 0.17966297 0.28603566 0.35415565 0.33126795
## Latitude -0.023278745 -0.24923633 -0.11263415 -0.06399535 -0.09638797
## Longitude 0.111582123 -0.47870730 0.26347048 0.39607791 0.39718795
## NumFullBaths 0.197103468 0.02853680 0.50697400 0.55930563 0.57901110
## NumHalfBaths 0.023538066 0.13207283 0.29091189 0.29311349 0.22887223
## NumRooms 0.183288217 0.10130555 0.73785572 0.76309302 0.73376688
## PctChange 0.027700561 -0.33540669 -0.17558573 0.19012993 0.35527952
## Price1998 0.192488733 0.10107537 1.00000000 0.86717492 0.78736808
## Price2007 0.180324953 0.00551488 0.86717492 1.00000000 0.93829759
## Price2011 0.184164416 -0.03391574 0.78736808 0.93829759 1.00000000
## Price2014 0.221799722 -0.06644213 0.73614877 0.89540777 0.95862560
## SquareFeet 0.191231258 0.05255086 0.80756328 0.86463690 0.82095325
## StreetNum 0.040891296 0.24189606 -0.02145054 -0.14495988 -0.19798526
## WalkScore 0.188996331 -0.23627500 0.45541129 0.50095050 0.51908938
## Zip -0.073169610 0.53390839 -0.20773850 -0.37403992 -0.37594128
## Bedrooms BikeScore Diff2014 Distance GarageSpaces
## HouseNum 0.002119478 0.19212774 0.15413318 -0.12439203 0.213116923
## Acre 0.056753476 -0.26596371 -0.18395645 0.17498754 0.179662972
## Adj1998 0.599395342 0.46062119 0.19809934 -0.44455151 0.286035661
## Adj2007 0.545760627 0.52293803 0.54403362 -0.48722784 0.354155654
## Adj2011 0.566108952 0.52643817 0.70485942 -0.47000388 0.331267947
## Bedrooms 1.000000000 0.23580847 0.29051333 -0.22846934 0.237262680
## BikeScore 0.235808466 1.00000000 0.33876338 -0.83567831 0.177785516
## Diff2014 0.290513326 0.33876338 1.00000000 -0.26475678 0.251829614
## Distance -0.228469345 -0.83567831 -0.26475678 1.00000000 -0.194933559
## GarageSpaces 0.237262680 0.17778552 0.25182961 -0.19493356 1.000000000
## Latitude -0.228257754 0.07905115 0.02485013 -0.32113258 -0.053388067
## Longitude 0.062023009 0.63782184 0.32329321 -0.56926915 -0.120406744
## NumFullBaths 0.484522900 0.21701946 0.35314835 -0.19122431 0.186130181
## NumHalfBaths -0.019164101 0.11970539 0.05902457 -0.17158509 0.175803871
## NumRooms 0.741011901 0.45412031 0.40171804 -0.37041186 0.258398031
## PctChange 0.019829923 0.11974807 0.83393325 -0.04082699 0.095239855
## Price1998 0.599395342 0.46062119 0.19809934 -0.44455151 0.286035661
## Price2007 0.545760627 0.52293803 0.54403362 -0.48722784 0.354155654
## Price2011 0.566108952 0.52643817 0.70485942 -0.47000388 0.331267947
## Price2014 0.559854474 0.50999562 0.80923709 -0.44926222 0.345327841
## SquareFeet 0.703035755 0.45082496 0.46002301 -0.42342170 0.394147584
## StreetNum -0.082720994 -0.22639775 -0.26571409 0.34578751 0.009781441
## WalkScore 0.199452323 0.92889448 0.35060372 -0.76072319 0.109478592
## Zip -0.026918105 -0.48371758 -0.32438208 0.30701353 0.128523113
## Latitude Longitude NumFullBaths NumHalfBaths NumRooms
## HouseNum -0.02327875 0.11158212 0.19710347 0.023538066 0.18328822
## Acre -0.24923633 -0.47870730 0.02853680 0.132072830 0.10130555
## Adj1998 -0.11263415 0.26347048 0.50697400 0.290911889 0.73785572
## Adj2007 -0.06399535 0.39607791 0.55930563 0.293113487 0.76309302
## Adj2011 -0.09638797 0.39718795 0.57901110 0.228872226 0.73376688
## Bedrooms -0.22825775 0.06202301 0.48452290 -0.019164101 0.74101190
## BikeScore 0.07905115 0.63782184 0.21701946 0.119705394 0.45412031
## Diff2014 0.02485013 0.32329321 0.35314835 0.059024571 0.40171804
## Distance -0.32113258 -0.56926915 -0.19122431 -0.171585094 -0.37041186
## GarageSpaces -0.05338807 -0.12040674 0.18613018 0.175803871 0.25839803
## Latitude 1.00000000 0.23539200 -0.05362645 0.025661444 -0.16304912
## Longitude 0.23539200 1.00000000 0.13460933 -0.024741303 0.23273064
## NumFullBaths -0.05362645 0.13460933 1.00000000 -0.164653898 0.48783671
## NumHalfBaths 0.02566144 -0.02474130 -0.16465390 1.000000000 0.19323671
## NumRooms -0.16304912 0.23273064 0.48783671 0.193236715 1.00000000
## PctChange 0.14492862 0.31729342 0.16777199 -0.049495866 0.09750167
## Price1998 -0.11263415 0.26347048 0.50697400 0.290911889 0.73785572
## Price2007 -0.06399535 0.39607791 0.55930563 0.293113487 0.76309302
## Price2011 -0.09638797 0.39718795 0.57901110 0.228872226 0.73376688
## Price2014 -0.05034933 0.38114908 0.54771054 0.215117766 0.71962945
## SquareFeet -0.12599754 0.28037440 0.56366439 0.272513762 0.85579218
## StreetNum -0.31431361 -0.35401400 -0.13566300 -0.004523405 -0.13295553
## WalkScore 0.09457122 0.59410185 0.16901516 0.115200773 0.46191186
## Zip -0.11963301 -0.82468131 -0.08107265 -0.023070904 -0.19451938
## PctChange Price1998 Price2007 Price2011 Price2014
## HouseNum 0.02770056 0.19248873 0.18032495 0.18416442 0.22179972
## Acre -0.33540669 0.10107537 0.00551488 -0.03391574 -0.06644213
## Adj1998 -0.17558573 1.00000000 0.86717492 0.78736808 0.73614877
## Adj2007 0.19012993 0.86717492 1.00000000 0.93829759 0.89540777
## Adj2011 0.35527952 0.78736808 0.93829759 1.00000000 0.95862560
## Bedrooms 0.01982992 0.59939534 0.54576063 0.56610895 0.55985447
## BikeScore 0.11974807 0.46062119 0.52293803 0.52643817 0.50999562
## Diff2014 0.83393325 0.19809934 0.54403362 0.70485942 0.80923709
## Distance -0.04082699 -0.44455151 -0.48722784 -0.47000388 -0.44926222
## GarageSpaces 0.09523986 0.28603566 0.35415565 0.33126795 0.34532784
## Latitude 0.14492862 -0.11263415 -0.06399535 -0.09638797 -0.05034933
## Longitude 0.31729342 0.26347048 0.39607791 0.39718795 0.38114908
## NumFullBaths 0.16777199 0.50697400 0.55930563 0.57901110 0.54771054
## NumHalfBaths -0.04949587 0.29091189 0.29311349 0.22887223 0.21511777
## NumRooms 0.09750167 0.73785572 0.76309302 0.73376688 0.71962945
## PctChange 1.00000000 -0.17558573 0.19012993 0.35527952 0.47059529
## Price1998 -0.17558573 1.00000000 0.86717492 0.78736808 0.73614877
## Price2007 0.19012993 0.86717492 1.00000000 0.93829759 0.89540777
## Price2011 0.35527952 0.78736808 0.93829759 1.00000000 0.95862560
## Price2014 0.47059529 0.73614877 0.89540777 0.95862560 1.00000000
## SquareFeet 0.14466382 0.80756328 0.86463690 0.82095325 0.80166923
## StreetNum -0.25836727 -0.02145054 -0.14495988 -0.19798526 -0.19633330
## WalkScore 0.11671121 0.45541129 0.50095050 0.51908938 0.51504881
## Zip -0.34790929 -0.20773850 -0.37403992 -0.37594128 -0.34849741
## SquareFeet StreetNum WalkScore Zip
## HouseNum 0.19123126 0.040891296 0.18899633 -0.07316961
## Acre 0.05255086 0.241896055 -0.23627500 0.53390839
## Adj1998 0.80756328 -0.021450537 0.45541129 -0.20773850
## Adj2007 0.86463690 -0.144959878 0.50095050 -0.37403992
## Adj2011 0.82095325 -0.197985256 0.51908938 -0.37594128
## Bedrooms 0.70303576 -0.082720994 0.19945232 -0.02691810
## BikeScore 0.45082496 -0.226397748 0.92889448 -0.48371758
## Diff2014 0.46002301 -0.265714093 0.35060372 -0.32438208
## Distance -0.42342170 0.345787506 -0.76072319 0.30701353
## GarageSpaces 0.39414758 0.009781441 0.10947859 0.12852311
## Latitude -0.12599754 -0.314313614 0.09457122 -0.11963301
## Longitude 0.28037440 -0.354014003 0.59410185 -0.82468131
## NumFullBaths 0.56366439 -0.135663003 0.16901516 -0.08107265
## NumHalfBaths 0.27251376 -0.004523405 0.11520077 -0.02307090
## NumRooms 0.85579218 -0.132955525 0.46191186 -0.19451938
## PctChange 0.14466382 -0.258367268 0.11671121 -0.34790929
## Price1998 0.80756328 -0.021450537 0.45541129 -0.20773850
## Price2007 0.86463690 -0.144959878 0.50095050 -0.37403992
## Price2011 0.82095325 -0.197985256 0.51908938 -0.37594128
## Price2014 0.80166923 -0.196333299 0.51504881 -0.34849741
## SquareFeet 1.00000000 -0.114888448 0.42862202 -0.25225094
## StreetNum -0.11488845 1.000000000 -0.27798184 0.21168545
## WalkScore 0.42862202 -0.277981844 1.00000000 -0.41890531
## Zip -0.25225094 0.211685446 -0.41890531 1.00000000
options(warn = oldw)
oldw <- getOption("warn")
options(warn = -1)
pcaobj <- princomp(input_data1_std[,num.names], center=TRUE, scale.=TRUE) #That data has already been standardized. So no need to standardize again.
options(warn = oldw)
oldw <- getOption("warn")
options(warn = -1)
attributes(pcaobj)
## $names
## [1] "sdev" "loadings" "center" "scale" "n.obs" "scores" "call"
##
## $class
## [1] "princomp"
options(warn = oldw)
oldw <- getOption("warn")
options(warn = -1)
print(pcaobj)
## Call:
## princomp(x = input_data1_std[, num.names], center = TRUE, scale. = TRUE)
##
## Standard deviations:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## 3.232928e+00 1.811809e+00 1.422716e+00 1.153309e+00 1.067574e+00 1.054650e+00
## Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12
## 9.459141e-01 8.327146e-01 7.980660e-01 7.335635e-01 6.737290e-01 5.967567e-01
## Comp.13 Comp.14 Comp.15 Comp.16 Comp.17 Comp.18
## 4.917841e-01 3.990183e-01 3.630865e-01 3.077534e-01 2.825305e-01 2.523302e-01
## Comp.19 Comp.20 Comp.21 Comp.22 Comp.23 Comp.24
## 2.271520e-01 1.805307e-01 7.760413e-09 0.000000e+00 0.000000e+00 0.000000e+00
##
## 24 variables and 104 observations.
summary(pcaobj)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 3.2329279 1.8118087 1.42271572 1.15330884 1.06757401
## Proportion of Variance 0.4397207 0.1381050 0.08515715 0.05595979 0.04794914
## Cumulative Proportion 0.4397207 0.5778257 0.66298290 0.71894269 0.76689184
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## Standard deviation 1.0546497 0.94591409 0.83271459 0.79806602 0.73356348
## Proportion of Variance 0.0467952 0.03764335 0.02917274 0.02679554 0.02263916
## Cumulative Proportion 0.8136870 0.85133039 0.88050313 0.90729867 0.92993782
## Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## Standard deviation 0.67372901 0.59675672 0.49178414 0.399018333 0.363086463
## Proportion of Variance 0.01909657 0.01498233 0.01017499 0.006698392 0.005546321
## Cumulative Proportion 0.94903439 0.96401673 0.97419172 0.980890110 0.986436431
## Comp.16 Comp.17 Comp.18 Comp.19
## Standard deviation 0.307753441 0.28253054 0.252330179 0.227152046
## Proportion of Variance 0.003984655 0.00335827 0.002678695 0.002170792
## Cumulative Proportion 0.990421086 0.99377936 0.996458051 0.998628843
## Comp.20 Comp.21 Comp.22 Comp.23 Comp.24
## Standard deviation 0.180530725 7.760413e-09 0 0 0
## Proportion of Variance 0.001371157 2.533696e-18 0 0 0
## Cumulative Proportion 1.000000000 1.000000e+00 1 1 1
head(pcaobj$scores)
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## [1,] -3.6677181 1.4532443 -1.1679153 0.1884363 -0.4415176 -0.1453377
## [2,] -3.2233429 0.6249662 -1.2500736 -0.4668716 -0.3732941 -0.4139261
## [3,] 1.8075818 0.9888193 0.8141262 2.1073792 0.5476904 -2.3057125
## [4,] -0.2600085 -0.3387117 1.6001479 0.9244271 -1.1776550 -2.0028545
## [5,] -1.5245964 1.6160933 3.4805473 -1.4802967 0.4779060 -1.5290237
## [6,] -2.2875418 1.7060820 0.4564077 0.7699515 -0.9030274 -2.0447949
## Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12
## [1,] -1.35406455 -1.79859111 -0.1967463 -0.6210496 -0.07516876 0.064074609
## [2,] -1.90222604 0.01626572 0.3920651 -1.5241403 0.21964807 0.007796853
## [3,] -0.07777814 -0.71096017 -0.9949215 -0.6051153 0.49784449 -0.834957242
## [4,] 0.15789439 -1.01320348 -0.2139452 0.2249570 -0.36933090 -0.616562556
## [5,] 0.30477027 -1.15629432 -1.0738907 0.6309555 -0.63205422 0.617776376
## [6,] -0.50365025 0.18179976 0.7681598 -0.6749029 0.60219646 -0.295296090
## Comp.13 Comp.14 Comp.15 Comp.16 Comp.17 Comp.18
## [1,] -0.20125072 -0.29307706 -0.09472695 -0.1997637 -0.01744789 0.05698116
## [2,] 0.42135876 0.41714658 -0.20085743 -0.3385085 0.04616254 0.46220683
## [3,] -0.19812970 -0.28887292 0.14002850 -0.3036917 -0.14621640 -0.17689040
## [4,] -0.08800209 -0.66700110 0.58083675 0.5331028 -0.67741049 -0.33959915
## [5,] 0.45069334 -0.01822462 -0.11446789 0.2431517 0.27335244 -0.13261024
## [6,] -0.01044616 -0.21174287 0.18066580 -0.1458738 0.05415001 -0.26353362
## Comp.19 Comp.20 Comp.21 Comp.22 Comp.23
## [1,] -0.247447091 -0.001243464 4.769372e-16 7.766572e-16 -8.700296e-16
## [2,] -0.176166271 -0.290130988 -6.781830e-17 2.119216e-15 6.601222e-16
## [3,] 0.071808333 0.046755621 -6.098944e-16 -5.236594e-16 -3.096167e-16
## [4,] -0.243240887 0.120640998 -1.589266e-15 1.448638e-15 -1.794964e-15
## [5,] 0.388183841 -0.144746795 5.473973e-16 -1.439164e-15 2.753225e-15
## [6,] 0.007483009 -0.143857001 -9.857269e-16 -1.587027e-15 2.139928e-15
## Comp.24
## [1,] 4.685124e-16
## [2,] 3.255400e-16
## [3,] -3.576545e-16
## [4,] -1.889961e-16
## [5,] -2.350924e-15
## [6,] -2.195283e-15
options(warn = oldw)
oldw <- getOption("warn")
options(warn = -1)
cor(pcaobj$scores)
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Comp.1 1.000000e+00 -1.580051e-16 -1.395031e-16 -5.467266e-16 -1.883876e-16
## Comp.2 -1.580051e-16 1.000000e+00 4.055168e-17 -8.896307e-17 2.912284e-17
## Comp.3 -1.395031e-16 4.055168e-17 1.000000e+00 -1.267168e-16 1.746643e-16
## Comp.4 -5.467266e-16 -8.896307e-17 -1.267168e-16 1.000000e+00 7.730438e-18
## Comp.5 -1.883876e-16 2.912284e-17 1.746643e-16 7.730438e-18 1.000000e+00
## Comp.6 -1.222468e-15 -1.867001e-16 4.154121e-18 -2.312300e-16 1.723675e-15
## Comp.7 -3.250962e-16 3.957709e-16 -2.174745e-16 -2.010321e-16 -8.022707e-17
## Comp.8 6.664927e-16 1.215623e-16 2.786075e-16 4.949045e-16 -2.817185e-16
## Comp.9 6.188892e-16 -1.211573e-16 3.482079e-16 1.586384e-16 -2.162732e-16
## Comp.10 9.731154e-16 -1.679738e-16 4.083560e-16 -9.161361e-17 2.613200e-17
## Comp.11 7.211421e-16 -3.169926e-16 3.025853e-16 -6.582751e-16 -7.429708e-16
## Comp.12 9.446501e-16 -3.264599e-16 5.540094e-16 -6.454917e-16 -1.378628e-15
## Comp.13 1.837625e-15 -1.373976e-16 -2.710350e-16 -6.470112e-16 -3.823640e-16
## Comp.14 5.450171e-16 -2.449674e-16 -2.143550e-16 -3.840538e-16 -9.873178e-16
## Comp.15 3.908771e-16 -5.586375e-16 -3.517766e-16 -9.793049e-16 1.840103e-16
## Comp.16 1.696264e-16 1.905954e-16 -5.593340e-17 5.924718e-16 6.446849e-16
## Comp.17 8.384544e-16 -9.237489e-16 -4.606962e-16 -6.349307e-16 1.568606e-16
## Comp.18 2.002688e-15 6.857871e-16 2.835739e-16 -1.930175e-15 5.884643e-16
## Comp.19 1.508817e-15 -2.032157e-16 -8.743513e-16 -7.644958e-16 -9.528696e-16
## Comp.20 1.579654e-16 1.081969e-15 -1.624796e-15 3.518967e-16 -1.323764e-15
## Comp.21 -4.156225e-02 1.112226e-01 9.702701e-02 5.025419e-02 4.301070e-02
## Comp.22 1.193163e-01 1.317005e-01 1.823238e-01 -3.518933e-02 6.258250e-02
## Comp.23 -1.004774e-01 2.542828e-02 3.361096e-03 1.407656e-02 -9.923853e-03
## Comp.24 1.417533e-01 3.072999e-02 -1.563205e-02 -1.465899e-01 2.313722e-02
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## Comp.1 -1.222468e-15 -3.250962e-16 6.664927e-16 6.188892e-16 9.731154e-16
## Comp.2 -1.867001e-16 3.957709e-16 1.215623e-16 -1.211573e-16 -1.679738e-16
## Comp.3 4.154121e-18 -2.174745e-16 2.786075e-16 3.482079e-16 4.083560e-16
## Comp.4 -2.312300e-16 -2.010321e-16 4.949045e-16 1.586384e-16 -9.161361e-17
## Comp.5 1.723675e-15 -8.022707e-17 -2.817185e-16 -2.162732e-16 2.613200e-17
## Comp.6 1.000000e+00 -1.743199e-16 3.490881e-16 -1.112087e-16 7.931291e-17
## Comp.7 -1.743199e-16 1.000000e+00 -8.740894e-16 3.373954e-16 6.178948e-17
## Comp.8 3.490881e-16 -8.740894e-16 1.000000e+00 -3.418759e-16 -1.974562e-16
## Comp.9 -1.112087e-16 3.373954e-16 -3.418759e-16 1.000000e+00 4.638567e-16
## Comp.10 7.931291e-17 6.178948e-17 -1.974562e-16 4.638567e-16 1.000000e+00
## Comp.11 -3.196747e-16 7.823402e-16 -2.032360e-16 1.261417e-15 2.618556e-16
## Comp.12 -4.304184e-16 8.624448e-16 -1.202871e-15 9.967707e-16 1.470154e-15
## Comp.13 -4.163868e-16 6.415369e-16 -3.351177e-16 9.358732e-16 -1.757242e-16
## Comp.14 -7.778673e-19 -2.407173e-16 9.877165e-16 1.258548e-15 1.324940e-15
## Comp.15 2.031605e-16 -4.950581e-16 2.445775e-15 -3.286640e-16 -2.757026e-17
## Comp.16 8.680878e-17 -1.024863e-15 8.003062e-16 -1.599559e-15 -7.890523e-16
## Comp.17 1.210716e-16 4.689345e-16 -2.631077e-16 1.651898e-16 4.875531e-16
## Comp.18 -7.222821e-16 7.888098e-16 -3.210576e-16 1.224203e-15 1.022465e-15
## Comp.19 2.906465e-16 6.800372e-16 -1.693469e-16 -5.163458e-16 1.584544e-16
## Comp.20 3.517859e-16 -1.025324e-15 1.893397e-15 -9.421336e-16 1.803581e-15
## Comp.21 1.019073e-01 -2.279398e-02 -2.077435e-01 -4.393539e-03 -7.335382e-02
## Comp.22 -3.921590e-02 3.805255e-02 3.843789e-02 -4.437388e-02 2.353334e-02
## Comp.23 3.509504e-02 -6.770753e-02 1.779734e-02 -1.496132e-02 2.548912e-02
## Comp.24 1.201719e-01 1.861835e-02 8.710037e-02 -1.363320e-01 -6.943967e-02
## Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## Comp.1 7.211421e-16 9.446501e-16 1.837625e-15 5.450171e-16 3.908771e-16
## Comp.2 -3.169926e-16 -3.264599e-16 -1.373976e-16 -2.449674e-16 -5.586375e-16
## Comp.3 3.025853e-16 5.540094e-16 -2.710350e-16 -2.143550e-16 -3.517766e-16
## Comp.4 -6.582751e-16 -6.454917e-16 -6.470112e-16 -3.840538e-16 -9.793049e-16
## Comp.5 -7.429708e-16 -1.378628e-15 -3.823640e-16 -9.873178e-16 1.840103e-16
## Comp.6 -3.196747e-16 -4.304184e-16 -4.163868e-16 -7.778673e-19 2.031605e-16
## Comp.7 7.823402e-16 8.624448e-16 6.415369e-16 -2.407173e-16 -4.950581e-16
## Comp.8 -2.032360e-16 -1.202871e-15 -3.351177e-16 9.877165e-16 2.445775e-15
## Comp.9 1.261417e-15 9.967707e-16 9.358732e-16 1.258548e-15 -3.286640e-16
## Comp.10 2.618556e-16 1.470154e-15 -1.757242e-16 1.324940e-15 -2.757026e-17
## Comp.11 1.000000e+00 1.225829e-15 -4.719202e-16 2.634925e-16 7.437129e-16
## Comp.12 1.225829e-15 1.000000e+00 -4.355110e-16 -1.299380e-16 -3.893957e-16
## Comp.13 -4.719202e-16 -4.355110e-16 1.000000e+00 -1.366225e-15 4.838017e-16
## Comp.14 2.634925e-16 -1.299380e-16 -1.366225e-15 1.000000e+00 1.761264e-15
## Comp.15 7.437129e-16 -3.893957e-16 4.838017e-16 1.761264e-15 1.000000e+00
## Comp.16 -4.630255e-16 -8.494333e-16 1.852018e-15 6.638944e-16 8.541588e-16
## Comp.17 -1.118369e-15 -2.383811e-15 -9.664191e-17 -1.946162e-15 4.013444e-15
## Comp.18 4.686807e-16 4.818353e-16 9.795217e-16 -3.904184e-15 -2.583601e-15
## Comp.19 -1.185900e-15 -2.211004e-16 -9.818855e-16 -1.122310e-15 2.238004e-15
## Comp.20 1.972055e-16 5.032361e-16 -2.380021e-15 -2.099046e-15 -8.457771e-15
## Comp.21 3.966345e-02 -1.900281e-02 -3.736642e-03 -6.396278e-02 9.896368e-02
## Comp.22 -4.660411e-02 1.578913e-02 -1.055270e-01 -1.586519e-01 -1.172079e-01
## Comp.23 6.627937e-02 2.277461e-02 3.589016e-02 3.784641e-02 1.698165e-01
## Comp.24 -1.663969e-01 -3.761654e-01 -2.938639e-01 -5.471513e-02 1.291964e-01
## Comp.16 Comp.17 Comp.18 Comp.19 Comp.20
## Comp.1 1.696264e-16 8.384544e-16 2.002688e-15 1.508817e-15 1.579654e-16
## Comp.2 1.905954e-16 -9.237489e-16 6.857871e-16 -2.032157e-16 1.081969e-15
## Comp.3 -5.593340e-17 -4.606962e-16 2.835739e-16 -8.743513e-16 -1.624796e-15
## Comp.4 5.924718e-16 -6.349307e-16 -1.930175e-15 -7.644958e-16 3.518967e-16
## Comp.5 6.446849e-16 1.568606e-16 5.884643e-16 -9.528696e-16 -1.323764e-15
## Comp.6 8.680878e-17 1.210716e-16 -7.222821e-16 2.906465e-16 3.517859e-16
## Comp.7 -1.024863e-15 4.689345e-16 7.888098e-16 6.800372e-16 -1.025324e-15
## Comp.8 8.003062e-16 -2.631077e-16 -3.210576e-16 -1.693469e-16 1.893397e-15
## Comp.9 -1.599559e-15 1.651898e-16 1.224203e-15 -5.163458e-16 -9.421336e-16
## Comp.10 -7.890523e-16 4.875531e-16 1.022465e-15 1.584544e-16 1.803581e-15
## Comp.11 -4.630255e-16 -1.118369e-15 4.686807e-16 -1.185900e-15 1.972055e-16
## Comp.12 -8.494333e-16 -2.383811e-15 4.818353e-16 -2.211004e-16 5.032361e-16
## Comp.13 1.852018e-15 -9.664191e-17 9.795217e-16 -9.818855e-16 -2.380021e-15
## Comp.14 6.638944e-16 -1.946162e-15 -3.904184e-15 -1.122310e-15 -2.099046e-15
## Comp.15 8.541588e-16 4.013444e-15 -2.583601e-15 2.238004e-15 -8.457771e-15
## Comp.16 1.000000e+00 7.059548e-15 2.328300e-15 1.191216e-15 -1.944070e-15
## Comp.17 7.059548e-15 1.000000e+00 -4.616353e-15 9.500629e-16 3.824335e-16
## Comp.18 2.328300e-15 -4.616353e-15 1.000000e+00 1.244276e-15 -9.280984e-15
## Comp.19 1.191216e-15 9.500629e-16 1.244276e-15 1.000000e+00 4.483573e-15
## Comp.20 -1.944070e-15 3.824335e-16 -9.280984e-15 4.483573e-15 1.000000e+00
## Comp.21 2.025343e-01 5.108472e-01 6.007043e-01 1.884813e-01 4.278068e-01
## Comp.22 1.252883e-01 -3.063308e-01 7.083177e-01 -4.863623e-01 -1.252973e-01
## Comp.23 2.739237e-02 3.806890e-01 -3.805075e-01 3.490994e-01 -7.226192e-01
## Comp.24 1.232140e-01 2.695339e-01 5.086775e-01 -3.813036e-01 3.710316e-01
## Comp.21 Comp.22 Comp.23 Comp.24
## Comp.1 -0.041562254 0.11931627 -0.100477416 0.14175331
## Comp.2 0.111222641 0.13170054 0.025428284 0.03072999
## Comp.3 0.097027006 0.18232381 0.003361096 -0.01563205
## Comp.4 0.050254193 -0.03518933 0.014076564 -0.14658995
## Comp.5 0.043010703 0.06258250 -0.009923853 0.02313722
## Comp.6 0.101907284 -0.03921590 0.035095036 0.12017193
## Comp.7 -0.022793975 0.03805255 -0.067707533 0.01861835
## Comp.8 -0.207743484 0.03843789 0.017797340 0.08710037
## Comp.9 -0.004393539 -0.04437388 -0.014961324 -0.13633203
## Comp.10 -0.073353821 0.02353334 0.025489124 -0.06943967
## Comp.11 0.039663455 -0.04660411 0.066279369 -0.16639690
## Comp.12 -0.019002812 0.01578913 0.022774612 -0.37616542
## Comp.13 -0.003736642 -0.10552699 0.035890161 -0.29386386
## Comp.14 -0.063962785 -0.15865185 0.037846406 -0.05471513
## Comp.15 0.098963682 -0.11720790 0.169816487 0.12919642
## Comp.16 0.202534279 0.12528828 0.027392372 0.12321398
## Comp.17 0.510847244 -0.30633084 0.380689048 0.26953394
## Comp.18 0.600704251 0.70831770 -0.380507538 0.50867748
## Comp.19 0.188481268 -0.48636227 0.349099428 -0.38130358
## Comp.20 0.427806836 -0.12529729 -0.722619204 0.37103162
## Comp.21 1.000000000 0.15910117 -0.243195596 0.55928336
## Comp.22 0.159101174 1.00000000 -0.504555717 0.48854829
## Comp.23 -0.243195596 -0.50455572 1.000000000 -0.51009099
## Comp.24 0.559283357 0.48854829 -0.510090990 1.00000000
pairs.panels(pcaobj$scores,gap=0,bg=c("green","red","yellow","blue","pink","purple"),pch=21)