About Data:

From 2020 to 2023, the Airbnb Worldwide collection has a lot of information that gives a full picture of the global short-term rental market. It includes different parts of Airbnb listings, like the type of property, information about the host, price trends, booking data, and guest reviews. The dataset has many different attributes that allow for in-depth analysis and exploration. This makes it a useful tool for projects that involve market research, studying customer behavior, pricing strategies, and hospitality industry trends.

The dataset is made up of several files, and each one looks at a different part of how Airbnb works. It gives information about how often properties are used, the types of people who host, the kinds of things guests like, and where ads are located. This organized style makes it easier to look closely at and understand key metrics, which helps everyone involved learn more about how the short-term rental market works and what new trends are coming up.

The dataset comes from Kaggle and is carefully put together here is the link " Data". It focuses on data from 2020 and 2023, which makes it reliable and useful for analysis projects. It can be used for many different types of research because it covers a lot of ground and is very thorough. Some examples are analyzing market trends, gathering information about competitors, and making strategic decisions in the tourism and hospitality industries.

Data Preprocessing:

Several important steps were taken in the exploratory data analysis (EDA) of the Airbnb dataset in order to understand and study the patterns and characteristics of the data.

First, the information was brought in and worked on using Python's pandas library. It has 18 columns and 458,177 entries. Each column shows a different aspect of an Airbnb ad, like the ID, host information, location, room type, price, reviews, and availability.

It was noticed that some columns in the dataset don't have numbers in them.Taking care of missing data is very important for making sure that future studies are honest and correct. Descriptive statistics were used to learn more about the data's spread and features.

As part of cleaning and preprocessing the data, the "last_review" field was changed to a datetime format to make it consistent and easier to analyze. Also, missing values were found in the "last_review" and "reviews_per_month" fields. To keep the data correct, these values may need to be filled in or handled in a certain way.

Data Cleaning:

As this is an open data a lot of cleaning had to be done first the raw data contained a lot of null values that needed to be removed,

RAW DATA

Here you can observe that there are a lot of null values and unnecessary columns

CLEANED DATA

All those uncesssary coulumns are removed here.

As you can see from the above figures, Several steps were taken during the data cleaning process to improve the quality and integrity of the information so that it could be analyzed later. This meant fixing missing values in different columns, making sure that data types were consistent, and changing the "last_review" column to a datetime style so that it could be used for time analysis. Methods were used to deal with lost data, and if outliers were found, they were dealt with using the right methods. To keep categorical factors consistent, data consistency checks were done, and duplicates were removed to keep the data's integrity. It was important to clean the information very carefully so that it could be used for exploratory analysis, modeling, and getting useful insights that would help with project decisions later on.

Data Vizualization:

7 different data vizualizations were performed between different parameters, and for some images random sampling is used as the dataset is very large.

HISTOGRAM OF MINIMUM NIGHTS

The histogram with overlaid density plots suggests that the most common minimum booking requirement across various room types is for a single night, as evidenced by the tallest histogram bars and sharp peaks in the density plots at the 1-night mark. There is a significant drop in frequency for bookings requiring a minimum of 2 nights, and even more so for 3 or more nights. The density plots indicate that while the preference for a 1-night minimum is quite strong across all room types, there is some variation among them, with one room type showing a slightly broader distribution, hinting at more flexibility in minimum night requirements. Overall, shorter stays are clearly favored in the booking patterns, with longer minimum stays being comparatively rare.

Avg Price By neighbourhood

A bar plot was made to look at the average number of nights people stayed in each neighborhood in the dataset. This bar plot showed the average length of stays for guests in each neighborhood by combining the data by area and finding the mean length of stay. This picture made it possible to compare how guests behaved and what kinds of accommodations they liked in different parts of the world. By looking at the heights of the bars that relate to each neighborhood, information could be gathered about how popular and appealing different areas are as places to stay. The bar plot also made it easier to find neighborhoods with longer or shorter average stays. This could help property owners, hospitality managers, and urban planners understand demand patterns, find the best pricing strategies, and improve the overall guest experience in certain areas.

Price Between 2020 and 2023

This histogram presents a comparative analysis of price distributions over two distinct years: 2020 and 2023. The visual representation, with the x-axis delineating the price range and the y-axis indicating the frequency of occurrences, reveals noticeable shifts in pricing trends. In 2020, the prices are widely dispersed with a significant concentration in the mid-price range, suggesting a broader variation in pricing. Conversely, the year 2023 exhibits a denser clustering of prices at the lower end of the spectrum, indicating a trend towards more affordable pricing or perhaps a narrower range of prices within the market. This comparison not only highlights changes in pricing structures over the three-year period but also potentially reflects economic shifts, market adjustments, or changes in consumer behavior.

Avg prices for different rooms in neighbourhood

The bar graph shows the average rental prices in different areas for two types of places to stay: private rooms and whole homes or apartments. The average price in each area is shown by a bar: orange bars show whole homes or apartments, blue bars show private rooms. The graph makes it easy to see the difference between the two types of housing by showing trends and price differences within and between neighborhoods. Most of the time, whole homes or apartments sell for more than single rooms. This is true in most places. Potential renters can use this information to compare costs, and homeowners can use it to set the right price for their rentals. It also shows the different value propositions of different areas, which may have something to do with amenities, accessibility, and how desirable each place is overall.

Neighborhood Accommodation Pricing by Room Type.

The scatter plot shown here shows a full comparison of hotel prices in different neighborhoods, broken down by room type. The different colored data points show four groups: private rooms (orange), hotel rooms (green), shared rooms (red), and whole homes or flats (blue). The prices are shown on the y-axis, and the neighborhoods are shown on the x-axis. This makes it easy to see how prices change in different areas. In this picture, whole homes or apartments usually have the biggest prices, followed by hotel rooms. It makes sense that shared rooms are the least expensive choice, while private rooms tend to be more expensive. The spread of the data points within each room type category shows how prices vary within that group. The spread of the data points across neighborhoods shows the price ranges in each area for tourists or renters looking at different areas and lodging options. This graph helps you understand how prices change for different types of housing in different neighborhoods.

Reviews per month compared with type of room

When compared to hotels and shared rooms, private rooms and whole homes or flats tend to have more and different kinds of reviews. This can be seen in the violin plot, which shows how the monthly reviews of different types of housing are spread out and how dense they are.

Comparision between room type and reviews

The scatter plot illustrates the monthly review count for various room categories, revealing a consistent number of reviews for private rooms, complete homes/apartments, and hotel rooms. In contrast, shared rooms exhibit a broader variety of monthly reviews.

Host Activity Analysis

The scatter plot clearly depicts the correlation between the quantity of listings overseen by each host and their average monthly reviews. Significantly, the utilization of logarithmic scales on both axes enables a comprehensive examination spanning a wide spectrum of values, hence exposing a varied array of environments. The data suggests that hosts who oversee a greater number of listings do not generally earn a higher average number of monthly reviews. This indicates that the level of guest engagement or pleasure, as measured by reviews, does not increase proportionally with the quantity of listings being handled. On the contrary, it indicates a complex relationship where hosts with a smaller number of listings can get a level of visitor engagement and contentment that is similar to those overseeing a larger variety of properties. The prevalence of green data points throughout the plot spectrum highlights an equitable dispersion of reviews, indicating a market in which both major and minor hosts coexist and prosper.

Geospatial Heatmaps

The picture to the right is a screenshot of a geospatial heat map, representing data points across a geographical region that stretches from San Francisco to San Jose, and extends towards Monterey. The heat map colors indicate different levels or densities of the data points, commonly used to depict metrics like population density, prices, or in the case of rental listings, the concentration of available properties.

The heat map clearly indicates that there are significant clusters of data points along the San Francisco Peninsula, spreading southwards across Silicon Valley to the San Jose area. The regions around San Francisco, Palo Alto, Mountain View, and San Jose have notably elevated levels of intensity, indicating a larger quantity or higher magnitudes of the measure being depicted, such as increased rental prices or a greater abundance of listings. The progression of color from blue to green to yellow and red represents a gradual increase in intensity, with the red areas representing the highest concentration or values

The heat map extends southwards, encompassing Santa Cruz and the Monterey Bay region, where it exhibits reduced intensity, indicating lower values of the observed parameter. If this heat map represents rental availability or pricing, it can be deduced that the places with more vivid colors indicate higher demand or cost. This trend corresponds to the well-established strong demand for housing and rentals in the San Francisco Bay Area and Silicon Valley. As one proceeds farther away from these technology and commercial centers towards the coastline and rural regions, there is a noticeable decline in demand.

Price Difference Bar plot

The bar plot displays the variation in prices for a sample of properties between 2020 and 2023 using many colors. Every bar corresponds to a distinct property, distinguished by its unique ID. The height and color of each bar accurately depict the extent and direction of the price fluctuation during the three-year duration. Warmer colors represent positive values, indicating a price increase, whereas cooler colors represent negative values, indicating a price decline. The figure displays a wide range of colors, indicating a varied market response. Some properties have witnessed substantial price hikes, while others have observed price declines or maintained relative stability. The volatility can be ascribed to several reasons, such as fluctuations in local demand, alterations in property condition or offerings, and broader economic trends. The utilization of a color gradient facilitates a straightforward and precise visual indication, hence enhancing the ability to swiftly recognize and comprehend these patterns.