How do you predict Airbnb prices in Seattle?

 

                                    (Off-grid AirBnb Earthship House In Ironbank, South Australia, Australia)


Introduction

Since 2008, guests and hosts have used Airbnb to travel in a more personalised way. Airbnb has been trending and even competing with large hotel chains. The unique home stay experiences definitely excite the guests, while come to price, how could individual private hosts 
come up with prices that put them in good market places?

In this post, I will describe a data based approach that can predict listing price for Airbnb houses in Seattle.


Airbnb Seattle Dataset

Consisted of all kinds of homestay activities in Airbnb datasets, there are over 90 features in the dataset, including:
  • Quality: Review Scores, number of reviews, property type, room type, and amenities
  • Location: Market, Neighbourhood, Zipcode
  • Price: Price, Cleaning fee, Extra Person, Cancelation fee
  • BookabilityHost response time, Host response rate, Review Scores, Number of Reviews
The dataset aims to collect multiple aspect of homestay activities related to house features and host services.

Part 1: What Airbnb houses are the most popular ones in market?

But  the question I was really interested in was 

  • What varieties have AirBnb brought into the market?
  • What type of houses are the most popular in the market?

Here we can see there are over 15 variety of properties are available on AirBnb. Besides common types like house, apartment, townhouse, condom, there are even loft, Cabins, Tree house, boat etc. The clear leaders are still house and apartment.

Over 60% are for entire home rental, around 30% are listed for private room rental, and only few than 5% are for shared rooms.

The data here is aggregated solely by house or room type, it will be interesting to see for individual house type, whether it is offering entire house, private room or shared room.




Part 2: What is the factor that related with the Airbnb price the most?

In this chart below, I was interesting to find out what features would relate the most with the Airbnb price. 

The highest correlation, indicating by light pink is 1. It means the two variables from the row and the column are perfectly correlated. Following orange, red, purple and black, the correlation decreases. When the correlation is 0, it shows the two variables are not relevant at all.

With that being said, I had two big takeaways from the plot:
  1. Referring to price, house features as accommodates, bathroomsbedrooms, and beds are highly correlated. Scored 0.65, accommodates shows the highest correlation with price.
  2. On the contrary, number of reviews, and review score rating barely shows any relevance with price.
Such findings are helpful to understand what factors may impact the listing Airbnb prices. The next step we will further investigate how to predict the price of Airbnb houses.

Part 3: Can we predict the price of Airbnb houses?

Finally, I wonder if we can actually predict prices of Airbnb houses. To make such decision among all house features, we need to weight the tradeoffs between model complexity and model accuracy.

A machine learning liner regression model is used, and a simulation process is performed to reduce the model complexity and achieve the optimal accuracy. In the Chart below, orange stands for the training dataset, and blue stands for the testing dataset. With more features included in the model, the accuracy of modelling for training set keeps increasing. While the accuracy of predicted results on training set hits its maximum with the model expands to 32 features. Here we find our optimal model. 


In the chart below, we can look at the size of the coefficients in the model as an indication of the impact of each variable on the price. The larger the coefficient, the larger the impact on price

The top negative features are room type in shared room, and zipcode in 98118, and 98108. The top positive features are bedrooms, bathrooms, and zipcode in 98104.


Conclusion

In this article, we took a close look at the Airbnb house price and how could we predict the price. Here are major findings:

1. Airbnb brought a large variety into the market. Over 15 variety of properties are available includes loft, Cabins, Tree house, boat etc. However, the clear leaders are still house and apartment.

2. House features as accommodates, bathroomsbedrooms, and beds are highly correlated to price. On the contrary, number of reviews, and review score rating barely shows any relevance.

3. A optimal model is built to predict the Airbnb price. The most positive features are bedrooms, bathrooms, and zipcode in 98104. The most 3 negative features are room type in shared room, zipcode in 98118, and 98108.

For more about this analysis, please see my Github available here.

Comments

Popular Posts