Building My First Python Webscraper

Here I program a webscraper using Python to extract hotel information from a portfolio website. I then turn the scraped data into a worksheet.

Introduction

Greetings,

Step into my digital domain. I warmly welcome those joining me on this exploration into the world of data analytics and marketing. I’m Christian Pena, a marketing enthusiast and data aficionado. Thanks for dropping by and joining me on this escapade into the realms of data analytics and science.

Embarking on the Code Adventure: The Task at Hand

In pursuing a comprehensive hotel list from a hospitality management company’s portfolio webpage, I found the need for a strategic and accurate approach that wasn’t manual. I was tasked to gather basic hotel information like hotel name, address, and phone number, among other details.

This led me to develop a web scraper using Python, a tool that proves instrumental in data analytics.

Naturally, I thought, “How can I develop a tool to do this for me rather than by hand”?

Fueled by a hunger for efficiency and spot-on data, I decided to make a cup of coffee, turn on my favorite playlist, and build a web scraper.

In this blog post, I will guide you through the technical aspects of my coding journey and showcase the effectiveness of the web scraping tool.

Setting the Digital Stage

PYTHON
# Loading the Tools
from bs4 import BeautifulSoup
import requests

# Loading the URL 
url = 'https://xeniareit.com/portfolio/'
page = requests.get(url)

# Crafting a Digital Map with BeautifulSoup
soup = BeautifulSoup(page.text, 'html')

# Mining through Table Data
hotel_headers = soup.find_all('th')[0:7]

# Stripping HTML Tags from the Data
headers = [header.text for header in hotel_headers]
PYTHON

To kickstart this web scraping process, I summoned the BeautifulSoup library and enlisted the help of requests to prepare my notebook to read HTML data. The portfolio page was none other than that of the eminent hospitality management company Xenia REIT.

Utilizing BeautifulSoup, I efficiently extracted table headings, ensuring a clean dataset by eliminating unnecessary HTML tags.

Delving into Data Extraction

PYTHON
# Creating a Data Table
import pandas as pd
df = pd.DataFrame(columns=headers)

# Collecting and Merging Hotel Intel
hotel_data = soup.find_all('tr')
hotel_data_1 = hotel_data[1:13]
hotel_data_2 = hotel_data[14:]
all_hotel_data = hotel_data_1 + hotel_data_2

# Loading Data into the DataFrame
for row in all_hotel_data:
    row_data = row.find_all('td')
    individual_hotel_data = [data.text.strip() for data in row_data]
    df.loc[len(df)] = individual_hotel_data

# Polishing and Amplifying the Data
df['City/State'].str.rsplit(' ', expand=True)

# Creating a CSV from DataFrame
df.to_csv(r'/Users/christianpena/Desktop/Jupyter Webscraping/hotel_data.csv')
PYTHON

Taking it up a notch involved a meticulous extraction of hotel data from two tables on the website, culminating in a comprehensive list of hotels. The data was organized into a DataFrame using the Pandas library, allowing for smoother manipulation and analysis.

Finally, the code exported the enriched dataset into a CSV file named ‘hotel_data.csv’ and used to update our CRM with the correct portfolio data.

The Conclusion

This web scraping project significantly enhanced my coding skills and provided me with a robust tool for data analytics in the hospitality management sector. As I continue refining and expanding my skill set, I aim to leverage this project as a stepping stone toward securing a rewarding position in the data analytics field.

Stay tuned for more insights into my dynamic journey through the world of data analytics!