Learning How to Scrape a Dynamic Website with Python

Scraping Robot
October 4, 2023
Community

Before we get into how to scrape a dynamic website with Python manually, let’s establish some basic concepts. There are two types of content on web pages: dynamic and static content. Dynamic content is the result of websites investing time and money into researching and gathering information about each visitor to the website. They use this information to provide each user with exactly what they’re looking for.

Table of Contents

What is a Dynamic Web Page?

What is a Dynamic Web Page?

Google is a perfect example of a dynamic website that changes the content it provides to users according to multiple factors. These factors include the user’s demographic, precise location, the time of day and the season they’re visiting the web page, their local weather, and their default language settings.

The goal of a dynamic web page is for businesses to achieve the following:

  • Better appeal to each user
  • Be able to engage with their audience
  • Build interest in their products or services
  • Convert as many visitors to the website into customers as possible

Two Types of Web Content

  • Static content: Static content is web content that remains the same for all visitors to the web page regardless of who the user is, where they’re located, their language settings, demographics, time of day, or other personal details. For example, if you were to view two websites on the same subject, a static web page would remain the same for every user.
  • Dynamic content: Where static content and web pages are identical for every user regardless of where they’re located in the world or any other personal details, dynamic content alters itself to present each user with a unique experience.In addition, the more a user visits the same dynamic website, the more the content they view adapts. The website stores the user’s data in order to provide a unique experience tailored to their personal details. This stored information also typically makes dynamic websites faster than static websites.

The Benefits of a Dynamic Website

The Benefits of a Dynamic Website

  • Personalized user experiences: Most users will appreciate the convenience of a dynamic website user experience and the speed the stored information on the website’s servers provides. Dynamic websites that store users’ information can provide personalized browsing and shopping experiences. According to a study done by Epsilon, 80% of consumers are more likely to make a purchase when brands offer personalized experiences.In addition, 64% of consumers say they don’t mind retailers saving their preferences and purchase history when it allows the retailer to provide a more personalized user experience.
  • Professional brand image: Catering your website to individual users builds trust and interest, as it’s highly optimized and doesn’t give users the impression they’re visiting the low-grade website of a sketchy company. 89% of digital businesses say they’re now investing in personalization.
  • Personalized user experience: People like to be noticed and appreciated. Because a dynamic website offers a personalized experience, they will likely prefer a dynamic website over a static one.People want to be treated like a person, not a number, according to research by Salesforce.  84% of consumers stated that personalization makes them feel like valued customers and is very important to winning their business.
  • Shorter buyer’s journey: It’s easier to check out when buying things online from a dynamic website that stores user information. 57% of the time prospective buyers leave a website mid-purchase, it’s because the checkout process is too lengthy. Online retailers have three seconds for the page to load before losing these customers. To put this in greater perspective, 80% of these customers never return to the website if they leave due to impatience.
  • Faster loading time: A dynamic website stores and updates its information each time a user visits the website. This results in faster loading times. Saving user data helps to convert prospective customers to paying customers by offering a shorter buyer’s journey.One of the best ways to speed up dynamic websites is by using a content delivery network (CDN) that hosts images, media, and CSS files on remote servers to deliver content to users quickly. Additionally, using a CDN reduces the load on your organization’s servers, contributing to increased website speed.
  • Better Search Engine Optimization Ranking: Researching your audience and catering your website content to meet their individual needs will have a noticeable, positive impact on your website ranking. This makes your content more relevant to users than a static website would be. Dynamic websites have higher conversion rates than static websites. This helps boost  SEO credibility with search engines.

Challenges of Scraping Dynamic Web Pages

Challenges of Scraping Dynamic Web Pages

Scraping dynamic websites is a bit tricky due to the unique content provided for each user. While not a simple process, with a good guide and some technological know-how, scraping dynamic web pages with Python can be achieved with concentrated effort.

Some of the most common challenges you’ll face when scraping a dynamic website:

  • Finding the correct web elements for the information you want to scrape: Unlike static webpages, dynamic content isn’t stored as HTML data. It’s stored on a server, making it more difficult to acquire.
  • Catching all of the correct elements: Dynamic content adapts each time a user visits a website. It also adapts for each individual user. The dynamic content itself is always changing. This makes it difficult to be certain you’re scraping all the elements from the selected web pages.
  • Scraping dynamic websites isn’t beginner-friendly: Due to the complexities of learning how to scrape a dynamic website with Python, it’s not a beginner-friendly solution, like you’ll find with scraping API providers like Scraping Robot. Scraping a web page with Python will require technical knowledge and experience to ensure an effective scrape.

How to Scrape the Web with Python

How to Scrape the Web with Python

Python is one of the most popular and accessible software programming languages due to its versatility, the extensive library of additional tools available to Python users, and ease of use compared with other programming languages.

What you’ll need to web scrape dynamic content with Python

  • Python: Of course, you’ll need Python installed on your computer. You can download a copy of the latest version of Python from their website.
  • A guide: A great guide can walk you through the necessary steps to scrape the data you’re trying to collect. It can serve as a valuable tool for someone unfamiliar with manually scraping websites and how to scrape a dynamic website with Python.
  • Whitelisting your IP address: You’ll need to use the Selenium automation tool for scraping web pages with Python. To set Selenium up to run headless web scraping, you can whitelist your IP through residential proxy providers like Rayobyte.
  • Selenium: You’ll need to download the Selenium automation tool when learning how to scrape a dynamic website with Python. You’ll also need to add other libraries, Pprint and By method. Smartproxy has a thorough guide to downloading and setting up Selenium on their Github.

Scraping Dynamic Web Pages

Scraping Dynamic Web Pages

Step One—Viewing the page source: Go to the target website you’re trying to scrape with Python, right-click the page, and select “View Page Source.” Alternatively, you can use the shortcut Ctrl+U on a Windows computer.

Step Two—Choosing elements to scrape: Now that you’ve opened the web page’s HTML, you need to find the elements you’d like to scrape and collect. For Smartproxy’s example, they’re searching for quotes from famous authors. To do this, you’ll need to scrape the elements with the following classes and input the code below. Quote (the whole quote entry), and, for each quote, you’ll select Text, Author, and Tag.

<div class=”quote” itemscope itemtype=”http://schema.org/CreativeWork”>

<span class=”text” itemprop=”text”>“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>

<span>by <small class=”author” itemprop=”author”>Albert Einstein</small>

<a href=”/author/Albert-Einstein”>(about)</a>

</span>

<div class=”tags”>

Tags:

<meta class=”keywords” itemprop=”keywords” content=”change, deep-thoughts, thinking, world” / >

<a class=”tag” href=”/tag/change/page/1/”>change</a>

<a class=”tag” href=”/tag/deep-thoughts/page/1/”>deep-thoughts</a>

<a class=”tag” href=”/tag/thinking/page/1/”>thinking</a>

<a class=”tag” href=”/tag/world/page/1/”>world</a>

</div>

</div>

Step Three—Adding Pprint and the By method: The next step now that Selenium is set up is to add the By method from the Selenium Python library. This will aid in simplifying selection. Once you’ve added the By method, enter the code below:

from selenium import webdriver

from selenium_python import get_driver_settings, smartproxy

# using By to simplify selection

from selenium.webdriver.common.by import By

To add Pprint, simply enter the following code:

#pprint to make console output look nice

import pprint

Step Four—Specifying Your Target

Be sure to specify which browser you’ll be using along with the target website to scrape. For Smartproxy’s example, they’ve chosen http://quotes.toscrape.com. Enter the code below.

def webdriver_example():

driver get_driver_settings()

if driver[‘DRIVER’] == ‘FIREFOX’:

browser =

webdriver.Firefox(executable_path=r'{driver_path}’.format(driver_path=driver(‘DRIVER_PATH’]), proxy=smartproxy())

elif driver [‘DRIVER’] == ‘CHROME’:

browser =

webdriver.Chrome(executable_path=r'{driver_path}’.format(driver_path=driver(‘DRIVER_PATH’]), desired_capabilities=smartproxy())

browser.get(‘http://quotes.toscrape.com/’)

Next, you need to identify the variables that will let Selenium know how many pages you want to scrape with Python. If you wanted to scrape five pages, you’d enter the code below.

browser.get(‘http://quotes.toscrape.com/’)

pages = 5

quotes_list = []

Step Five—Choose a Selector: There are a variety of selectors to choose from but to follow the example we’ve been using, we’ll use Class. To locate the correct elements, we’ll use the Class selector. Once it’s up and running, enter the following code.

browser.get(‘http://quotes.toscrape.com/‘)

Step Six—Begin Scraping: Now it’s time to start scraping your target website using the code below.

for i in range (pages) :

quotes = browser.find_elements(By.CLASS_NAME, ‘quote’)

for quote in quotes:

tags = quote.find_elements(By.CLASS_NAME, ‘tag’),

tag_list = []

for tag_in tags:

tag_list.append(tag.text)

quotes_list.append({

“author’ : quote.find_element(By.CLASS_NAME, ‘author’).text,

‘text’ : quote.find_element(By.CLASS_NAME, ‘text’).text,

‘tags’ : tag_list,

})

next_page_btn = browser.find_element(By.PARTIAL_LINK_TEXT, ‘Next’).click()

Then:

for i in range(pages):

Next: We’ll have the loop run five times, as we indicated in the pages variable, using the following code.

quotes = browser.find_elements(By.CLASS_NAME, ‘quote’)

Next: 

<div class=”quote” itemscope itemtype=”http://schema.org/CreativeWork”>

<span class=”text” itemprop=”text”>“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>

<span>by <small class=”author” itemprop=”author”>Albert Einstein</small>

<a href=”/author/Albert-Einstein”>(about)</a>

</span>

<div class=”tags”>

Tags:

<meta class=”keywords” itemprop=”keywords” content=”change, deep-thoughts, thinking, world” / >

<a class=”tag” href=”/tag/change/page/1/”>change</a>

<a class=”tag” href=”/tag/deep-thoughts/page/1/”>deep-thoughts</a>

<a class=”tag” href=”/tag/thinking/page/1/”>thinking</a>

<a class=”tag” href=”/tag/world/page/1/”>world</a>

</div>

</div>

Next: 

for quote in quotes:

Next: Selecting the tag class collects all the tags for a given quote. Iterate through them, creating a list with – tag_list.

tags = quote.find_elements(By.CLASS_NAME, ‘tag’)

tag_list = []

for tag in tags:

tag_list.append(tag.text)

Next: 

quotes_list.append({

And: 

‘author’ : quote.find_element(By.CLASS_NAME, ‘author’).text,

‘text’ : quote.find_element(By.CLASS_NAME, ‘text’).text,

Step Seven—Formatting results from scraping websites with Python

This final line of code will print out the results of the web scraping you’ve just done into the console window following the completion of the scrape.

pprint.pprint (quotes_list)

Your Results Are In

Your Results Are In

Following the last line of code, you should have received your results. If you didn’t get what you expected, you may have made a mistake and need to double-check your code. Mistakes are part of the learning process!

If you feel like you don’t have time to make the mistakes it takes to learn the process, or if your list of pages to be scraped is longer than your work day, a scraping API provider like Scraping Robot is an affordable option offering multiple levels of service. They can save you valuable time and resources that could be better used on core business processes.

The information contained within this article, including information posted by official staff, guest-submitted material, message board postings, or other third-party material is presented solely for the purposes of education and furtherance of the knowledge of the reader. All trademarks used in this publication are hereby acknowledged as the property of their respective owners.