Research - What type of Airbnb host are you?
What type of Airbnb host are you?
HEC Data Minds zooms in on host types in large European cities
Have you got your next holiday booked? Are you planning on finding that ultimate bargain on
Airbnb? Before you do, we have a question for you: have you ever thought about who it is that is hosting you? No worries if you haven’t, we got you covered.
In a nutshell
Since Airbnb’s creation in 2008, many studies have focussed on the users of the platform: who they are, how they behave, and if they can be categorized into different user types. In this blog post, we’ll dive into the types of Airbnb hosts and their distribution across European cities by focusing on 254,160 listings across 10 cities in 4 European countries in 2018. To do this, we apply a fancy machine learning technique called K-means clustering. Interestingly, our findings indicate that a segment of single-listing hosts (“small professionals”) shows behavior more similar to multiple-listing hosts (“large professionals”) than other single-listing hosts (“small amateurs”), and that the proportions of these three groups significantly varies across cities. We’re proud we could contribute to the growing literature on the users of shared economy platforms.
Table 1: Distribution of listings by Airbnb host types in 10 major European cities
As you know all too well, Airbnb has become very popular over the last few years. In March 2020, the platform has over 150m users worldwide, with over 2m staying in Airbnb at any given night in any of the 7m Airbnb listings in over 220 countries and regions. Not bad at all.
A very brief history of existing literature
Like professional researchers, we started off with a quick review of the existing literature. There is no question that the literature is methodologically diverse (Dann, Teubner, & Weinhardt 2019). Still, most articles surround the areas of Tourism/Travel/Hospitality (48) or Information and Management (42). Unsurprisingly, studies on its implications (e.g., housing prices) are the most widely discussed in the media, but a growing number have focused on the decision-making of the participating agents: hosts, platforms, and users. Factors that affect the prices set by hosts or those that affect the choice between listings made by them are important examples.
Interestingly, the effect of these factors has been considered to vary across hosts and clients, which are themselves a diverse group. Nevertheless, “professional” hosts that have more than one listing are commonly distinguished from hosts that only have a single listing (as defined by Airbnb). More recently, Tussyadiah (2016) identified five clusters of hosts when analyzing the Airbnb in New York, relying on 12,785 listings. Tussyadiah and Park (2018) further clustered the hosts based on self-descriptions in the profile section, finding that different clusters are perceived as being more or less trustworthy by clients.
Our hypothesis and contribution to the literature
Straightaway, we saw limitations with the clustering of hosts by their self-description of a profile section. We also noticed that partly due to a past lack of available Airbnb data, numerous studies fail to take into account intercity and inter-country differences; especially among studies that focused on identifying host types. To address those drawbacks in existing studies on the decision-making of Airbnb hosts, we focused our attention on identifying behavioral differences among hosts, and across multiple large European cities. Our hypotheses then became:
Hypothesis 1: Among the single-listing hosts, a significant amount show behavior more similar to multiple-listing hosts than to other single-listing hosts.
Hypothesis 2: The proportion of host types varies significantly across cities.
A short comment on the methodology
We used data from Airbnb datasets stored in the HEC Data Factory, which were provided by AirDNA. In particular, this study uses Airbnb listings data of 2018 and 2019 from 10 large European cities (see table 1). Our dataset has 56 variables, which consist of 55 features and 1 id column. For the purpose of our analysis, we kept 8 variables, from which the following 6 emerged.
Table 2: Description of the relevant independent variables
For parsing and analyzing the data, we used Python programming language with Jupyter Notebook and with JupyterLab as IDE. A crucial first step was to filter our observations with missing values and/or impossible values, and listings that had a 0% response rate. In the end, we only kept an average of 25.8% of the original listings overall. Why so low? We’ll only say two words: fake accounts.
Cluster analysis was our best friend in this effort to segment the listings into professional and non-professional hosts. It’s used in unsupervised learning to group similar data points together and discover underlying patterns. To find the different clusters given the chosen variables, we used K-means, a machine learning algorithm that looks for a fixed number of clusters in a dataset. To make our life easy, we analyzed the relationships between the different variables through the correlation matrix. Mixed with theoretical insights, we determined which variables to keep in our analysis. The matrix below shows this step for Parisian listings.
Figure 1: Correlation matrix for single-listing hosts in Paris
 Ironically, when writing this paper, one of us almost accidentally paid for a weekend in a non-existing listing in Paris.
To be absolutely sure about the robustness of our findings, we performed our clustering analyses with three different methods that were applied to all 10 cities. Figure 2 is an example of the clustering done here, specifically method 1 on Parisian data.
Table 3: The three studied clustering methods
Now take a look at Figure 2: you can see how the different clusters are completely separated when plotting Blocked_rate and Response_rate. K-means algorithm was able to separate Paris listings with 1 owner into 3 different clusters. The professional amateur cluster has a Response_rate that ranges between 75 and 100% and a Blocked Rate from 0 to 35%; values, which are very similar to the ones of professionals hosts in Paris as it can be seen from Table 5.
Figure 2: Clustering on Parisian Airbnb hosts under Methodology 1
Let’s dive into the analysis
For the sake of simplicity, we describe the findings for Paris only. Similar results are found for all studied cities, however. The first method (Table 5), which clusters the single-listing hosts, shows that for Paris, a very large portion of the single-listing hosts is more similar to the average of the multiple-listing hosts than other single-listing hosts. Over 59% of listings from single-listing hosts were owned by hosts that were as responsive to messages sent to them by potential clients (97% vs. 94%) and had their listing as rarely blocked (14.8% vs. 18.5%) as multiple-listing hosts.
The second method (Table 6), which clusters all hosts, shows that clustering on all Parisian hosts results in a set of clusters where multiple-listing hosts (9.6% of the total number of hosts) never form a majority of the cluster members. Over 75% of the multiple-listing hosts are in a cluster of hosts that has a very high response rate (97.9%) and a very low blocked rate (14.7%) (i.e., “professionals”).
The third method (Table 7), which clusters all listings, shows results aligned to those of method 1 and 2. You can see that close to two-thirds of the listings from multiple-listings hosts and roughly 40% of listings from single-listing hosts fall under the professional cluster. The fact that the analysis created 4 clusters instead of 3 with methods 1 and 2 explains the slight changes in proportions.
Finally, when we compared our results from the three methods between the studied cities, we found something interesting. In method 1 (Table 8), while the “professional” single-listing owners represent 59% of single-listing owners, this varies quite significantly: while it’s (merely) 58.9% in Munich, this goes up to 76.71% amongst single-listing owners (“small hosts”) in Rome. Despite changes in the number of clusters, similar results are found with methods 2 and 3. In method 2, while 65.60% of single-listing hosts fall into the most “professional” category in Marseille, this is 62.57% in Lyon. For Rome, it is a high 75.47% whilst in Munich, it is 58.49%. In method 3, listings of “small professional hosts” represent 58.61% of listings of single-listing owners in Munich, versus 77.93% in Rome.
And finally… we conclude
It’s time to celebrate - the results from our analyses support both our hypotheses. First, the results suggest that after clustering, a significant percentage of the hosts that own a single listing act more similarly to hosts with multiple listings than hosts with a single listing. You might think this is straightforward, but, to the best of our knowledge, this has not been taken into account in past studies. Also, the results of the analyses with method 3 suggest that beyond diversity in the types of single-listing hosts, there are surprising differences among multiple-listing hosts that cannot be fully attributed to the exact amount of listings held. In Paris, for example, almost 40% of multiple listings hosts were more similar to “less professional” single-listing hosts than the high-response, low blocked rate baseline that one would expect.
Second, we observe significant intercity differences in the proportion of the host types. Of the total amount of listings, the percentage of single-listing hosts with high responsiveness and low blocked rate (“small professionals”) varies from 27.12% in Rome to 46.10% in Lyon, the percentage of other single-listing hosts from 3.88% in London to 30.32% in Munich, and the percentage of multiple-listing hosts (“large professionals”) from 26.26% in Munich to 64.65% in Rome.
We think our findings are most relevant for future research relying on modelizations that distinguish Airbnb hosts. In any circumstances, they help paint a more realistic picture of the industry. Yet, we admit several limitations to our study. First, the nature of the accessible datasets limited our focus to 10 European cities. Second, we didn’t consider whether the listing and the host had matured. Finally, our reliance on average values for listings do not allow us to observe recent changes in behavior.
That’s why we still think that further research should continue to study the diversity of Airbnb hosts. Researchers could take a closer look at the frequency at which hosts change certain elements on a listing’s page relative to other hosts, and study their responsiveness to cyclical and noncyclical changes in demand for Airbnb listings by changing their prices in particular.
 After some analysis on why the number of clusters in Rome and Naples is 4 instead of 3 as in the rest of cities, we have found that the supplementary cluster falls inside the professional amateur label but has a much higher number of photos (Rome: 42,99, Naples 46,66). This might be caused by a cultural difference among those 2 cities and the rest since the nº of photos of Milan is standard (Table 10).
*For additional tables and access to further data preprocessing, clusterings, and methodology please get in with us via the Comments section below.