Whether you’ve been scraping for years or you’re just getting started with web scraping research, there’s no doubt that it’s a valuable tool. However, you may have noticed that your scraper doesn’t seem to be gathering the same amount of information as it used to. If that’s the case, you might need to start web scraping AJAX pages.
Table of Contents
AJAX is the future of the internet, but it also makes performing thorough scrapes difficult. In this guide, you’ll learn everything you need to know about what it means to start scraping websites with AJAX, why it’s essential, and how to do it properly so you get all the information you need.
What Is an AJAX Page?
Have you ever visited a page that automatically loads extra content as you scroll? Then you’ve seen AJAX pages in action. Social media sites with “infinite scroll” are the most common examples of AJAX pages. Still, AJAX can be found on any site that presents dynamic and constantly updating content.
AJAX is useful for sites that want to keep their information updated. With AJAX calls in place, users don’t need to refresh the page to get new information. Instead, the page and the server constantly exchange small packets of data back and forth.
When new information comes through, the page automatically updates its display. It provides a smoother user experience, but it also puts barriers in the way of anyone who wants to scrape the site.
What Is AJAX Web Scraping?
Scraping websites with AJAX is a little different from any other kind of scrape. That’s because sites that rely on AJAX calls don’t display the information you want to gather directly in their HTML. Instead, the place where the information would normally appear is filled with the AJAX script. If you just scrape the raw code, you’ll collect the AJAX code and nothing else.
So the question is how to scrape AJAX loading websites? You need to think outside the box. Instead of just scraping the HTML, you need to force the page to perform the AJAX call and then scrape the result.
That’s why scraping AJAX websites takes more effort. There are additional steps between you and the data you need. However, you can scrape websites with AJAX just as effectively as any other site with the proper preparation.
How to Scrape AJAX Pages
The normal scrape process goes something like this:
- You collect the URLs of the pages you want to target.
- You have your scraper request the HTML from each webpage.
- The scraper looks through the HTML for the CSS selectors connected to the data you want to collect.
- The scraper extracts that data and consolidates it into an organized file for you to examine later.
Scraping AJAX websites takes a little more work. You’ll need to take the following steps after you request the HTML from the site and before the scraper looks for data.
Gather your AJAX scraping tool
If you’re building your own AJAX scraper, you can simplify things by using the right tools. Depending on the language you’re using, you can implement multiple AJAX web scraping modules in your program. These modules include:
Python is the most commonly-used language for scrapers, so most of these programs are designed for use with that language. However, you can find alternatives for most programming languages used for scraping.
Find the AJAX request
Once you’ve gathered your tools, you can start your research. To guide your scraper to the actual AJAX call, you need to find where and how it’s stored on the page.
This information is found most easily with Chrome’s developer tools. To use Chrome to find AJAX calls:
- Go to the page you want to scrape
- Select “View” or the trio of stacked dots at the top right of the browser window
- Click “More Tools,” then “Developer Tools”
- When the “Developer Tools” box appears on your screen, go to the “Network” tab
- Scroll to the XHR section, and refresh your screen if it’s empty
- Explore the different results until you find the one you want, then go to the “Headers” tab
- Scroll to the “Form Data” field
This is where the AJAX code can be found. You don’t need to worry about the majority of the code, so don’t worry. All you need to see is the request and endpoint parameters. These can take different forms, but they’re typically found at the beginning and end of the AJAX call. You may also discover that the site uses an API, which will significantly simplify the process.
Learn the format of the response
By finding the request and endpoint parameters, you’ve figured out how the AJAX call is performed. Now you need to figure out what the call actually returns to the webpage so you can tell your scraper what to look for.
In the “Developer Tools” box, go to the “Response” tab. The AJAX call’s response to the website will appear here. Often, but not always, the data will be presented in a JSON format. Once you know the format of the information you want to collect, you can start writing a scraping program that’s correctly configured to gather it.
Write your scraper
Now comes the tricky part. With the information you’ve gathered, you can write a web scraper that understands how to scrape AJAX pages effectively. This summary uses Python and the Requests library, but you can substitute your preferred tools and languages once you understand the basic process.
The fundamental program development process is simple:
- Open a new program
- Write a function that replicates the AJAX call you want to collect
- Write a function that parses the response
- Print that information to a file
- Repeat from the beginning
However, there’s a little more detail to the process if you zoom in.
First, create your project and install and import the tools you choose to use. Open your new file in your preferred Python editor, then create a function that replicates the AJAX request parameter you found earlier. This may take the form of the following if the website has an API for viewing AJAX information:
response = requests.post(self.API_url, info=data)
If the site doesn’t have an AJAX API, then you’ll need to do something more complex, like:
r = requests.post(
The first example sends the scraper to the site’s AJAX API, while the second uses the actual AJAX request parameters to initiate its own AJAX server request. This example checks whether a page on the website for the Journal Sentinel newspaper has been scrolled.
Next, you need to capture and parse the response. Beautiful Soup works well here, especially if you’re receiving a JSON response. You can use json.loads() to convert the text of the JSON file into a Python dictionary, which you can then convert into any file format you want.
Now you need to designate where this information should go. This action is as easy as setting up a target file at the beginning of the program, then printing the dictionary containing the JSON results into that file.
If you only want one URL scraped, you can stop here. Otherwise, you should set up a function that repeats this process for multiple URLs. The easiest way to do this is to load the URLs you want to scrape into a dictionary at the start of the program, then cycle through each URL until the list is done.
You can follow this process whether you’re working out how to scrape AJAX content on WordPress sites, social media platforms, news sites, or any other page you choose. The basic functionality is always the same.
The Best Way to Scrape AJAX Pages
You don’t need to put together an AJAX web page scraper from scratch, though. Instead, you can use Scraping Robot.
Scraping Robot is a prebuilt, free scraping API that already has AJAX functionality built in. With Scraping Robot, gathering AJAX data is as simple as specifying the details you want and letting the scraper collect them for you. All you have to do as the developer is to collect the URLs you wish to scrape and determine what information you want to study.
You can learn more about Scraping Robot or sign up to access your free monthly scrapes by reading this page.
Start Collecting AJAX Data Today
As more and more websites prioritize responsive design, AJAX calls are becoming increasingly common. That means that more data is being hidden behind AJAX calls, making it invisible to ordinary web scrapers. The solution is to start scraping web pages with AJAX in mind.
You can write your own AJAX web scraper, but you don’t have to. Scraping Robot is already designed with full AJAX functionality built-in. With Scraping Robot, web scraping AJAX pages is as simple as any other type of scrape. You don’t need to waste time trying to write your program from scratch. Instead, work with Scraping Robot to save yourself time and effort and get the data you need today.
The information contained within this article, including information posted by official staff, guest-submitted material, message board postings, or other third-party material is presented solely for the purposes of education and furtherance of the knowledge of the reader. All trademarks used in this publication are hereby acknowledged as the property of their respective owners.