How To Pull Data From a Website into Excel and Google Sheets

Scraping Robot
October 8, 2021
Community

If you need to extract a lot of data from a website, consider collecting the data using a package for web scraping with Google Sheets and Excel compatibility.

Table of Contents

Excel and Google Sheets are excellent ways to organize data when you’re extracting many different types from various pages on the website. For instance, if you need to gather the data for more than 300 new products that span more than 20 pages on an eCommerce website, you should scrape data from websites to Excel and Google Sheets to save time and energy.

It’s possible to manually scrape data to Excel and Google Sheets but to boost your business agility, you should use an Application Programming Interface (API) while scraping data. APIs will allow you to engage in automated data scraping from websites into Excel without the need for manual input.

Read on to learn more about how to scrape data from a website into Excel. If you’re already familiar with some of the questions we answer below, feel free to use the Table of Contents to skip to the section(s) you’re most interested in. We’ve provided a step-by-step tutorial showing how you can use an API to pull data from a website into Excel and Google Sheets, using our free Scraping Robot API as an example.

Why Use an API to Scrape Data?

Why Use an API to Scrape Data?

Before we explain how to scrape data from a website into Google Sheets and Excel, let’s discuss how APIs work and why you should use an API to extract data.

APIs are software programs that bridge different web applications or software. Without APIs, these applications and software would not be able to communicate or exchange information or functionalities. In many ways, APIs function like your computer monitor—without your monitor, you wouldn’t be able to see what your computer is doing. Similarly, without an API, many applications—including Excel and Google Sheets—wouldn’t be able to process, store, or manipulate the data you’ve extracted from your website.

In the data scraping world, APIs are used to connect to any database or analytics software. Scraping Robot’s free web scraping API, for instance, can connect to any software and allow you to extract data from any website. APIs can also help you collect data in real-time and move data directly to your software or application without manual input.

Scraping websites without an API is possible, but extremely time-consuming. You would have to download the page yourself and go through all of the HTML elements by hand. Without thorough knowledge and experience of various coding languages and HTML, it’s difficult to scrape websites without an API. As such, you should use an API to pull data from a website into Excel and Google Sheets.

How to Scrape Data into Excel and Google Sheets

How to Scrape Data into Excel and Google Sheets

Step 1: Pick and download a web scraping API.

When you are web scraping Google Sheets or Excel data, start by downloading and installing an Excel web scraping API of your choice.

Before you choose your ideal scraper, keep in mind that not every scraper was made equal. To find the best one for you, ask the following questions:

  • What functions are a must for you? What do you want to do with the extracted data?
  • What is my budget?
  • Will I be extracting data from multiple websites at once? If yes, how many sites? 
    • Not every scraping tool lets you extract data from multiple sites at once. If you’re short on time and need to extract a lot of information from many sites at once, look for a program that allows you to do this.
  • Is the scraping tool secure?  
  • This is an important question to ask if you’re extracting sensitive or confidential data.

For this example, we’ll be using Scraping Robot’s free API. One of the best scraping APIs available, Scraping Robot API is browser-based and doesn’t require you to download or install anything. It allows you to scrape HTML content as well as individual DOM elements on web pages.

Compared to other APIs, this API also lets you extract more information at once. Most web scrapers use an extraction sequence for HTML elements on a page, which means you have to manually pick a category to extract after the first category (typically text) is done. On the other hand, Scraping Robot API automatically gives you every HTML category.

For more information, check out our documentation here.

Step 2: Copy the URL you want to scrape.

Go to the website page you want to scrape and copy the URL.

Step 3: Paste the target site’s URL into your scraper’s bar and click “Run.” 

This will start the scraping process. You should receive the final HTML output within a couple of seconds.

Step 4: Download the HTML output. 

You should see an option to download the extracted content in your scraper. In Scraping Robot, you’ll see an option to “Download Results” under the black box at the bottom. Once you click on this button, the HTML output will download to your computer.

How to Convert Output to Table-Data

How to Convert Output to Table-Data

You now have the HTML output, but you need to convert it into a tabular format. Converting the output to table-data will move the extracted information from your web scraper to Excel. Without converting, you will have a hard time gathering and manipulating data, particularly if you want to aggregate the data of many scrapes in one Google Sheet or Excel document.

API response data structure

Before we talk about how to convert the HTML output to table-data let’s take a look at how API responses typically look like. In this example, we’ll be scraping music.apple.com and our goal is to collect a list of track names using XPath.

The API’s JSON response should look like this:

   “status”: “SUCCESS”,

   “date”: “Tue, 14 Sep 2021 15:21:13 GMT”,

   “url”: “https://music.apple.com/us/artist/pearl-jam/467464/see-all?section=top-songs”,

   “httpCode”: 200,

   “result”: {

       “xpathElements”: [

           {

               “xpath”: “//div[@class=\”songs-list-row__song-name\”]”,

               “textNodes”: [

                   “Black”,

                   “Alive”,

                   “Even Flow”,

                   “Jeremy”,

                   “Yellow Ledbetter”,

                   “Better Man”,

                   “Last Kiss”,

                   “Daughter”,

                   “Better Man”,

                   “Just Breathe”,

                   “Just Breathe”,

                   “Elderly Woman Behind the Counter In a Small Town”,

                   “Even Flow”,

                   “Once”,

                   “Release”,

                   “Daughter”,

                   “Sirens”,

                   “Garden”,

                   “Elderly Woman Behind the Counter In a Small Town”,

                   “Jeremy”,

                   “Why Go”,

                   “Animal”,

                   “Yellow Ledbetter”,

                   “Porch”,

                   “Oceans”,

                   “Yellow Ledbetter”,

                   “Given to Fly”,

                   “Corduroy”,

                   “Go”,

                   “Rearviewmirror”,

                   “Dissident”,

                   “Who You Are”,

                   “Dance Of The Clairvoyants”,

                   “Last Kiss”,

                   “Alive (2004 Remix)”,

                   “Nothingman”,

                   “Rearviewmirror”,

                   “Superblood Wolfmoon”,

                   “State of Love and Trust”,

                   “World Wide Suicide”,

                   “Corduroy”,

                   “Dissident”,

                   “Deep”,

                   “Alive (2008 Brendan O’Brien Mix)”,

                   “Black (2004 Remix)”

               ],

               “htmlElements”: [

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Black<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Alive<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Even Flow<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Jeremy<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Yellow Ledbetter<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Better Man<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Last Kiss<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Daughter<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Better Man<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Just Breathe<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Just Breathe<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Elderly Woman Behind the Counter In a Small Town<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Even Flow<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Once<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Release<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Daughter<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Sirens<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Garden<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Elderly Woman Behind the Counter In a Small Town<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Jeremy<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Why Go<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Animal<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Yellow Ledbetter<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Porch<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Oceans<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Yellow Ledbetter<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Given to Fly<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Corduroy<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Go<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Rearviewmirror<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Dissident<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Who You Are<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Dance Of The Clairvoyants<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Last Kiss<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Alive (2004 Remix)<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Nothingman<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Rearviewmirror<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Superblood Wolfmoon<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>State of Love and Trust<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>World Wide Suicide<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Corduroy<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Dissident<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Deep<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Alive (2008 Brendan O’Brien Mix)<!–%-b:32%–></div>”,

                   “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Black (2004 Remix)<!–%-b:32%–></div>”

               ]

           }

       ],

       “html”: “<!DOCTYPE html><html prefix=\”og: http://ogp.me/ns#\” dir=\”ltr\” lang=\”en-US\” hydrated=\”\”><head>\n    <meta charset=\”utf-8\”><style data-styles=\”\”>apple-podcast-player,apple-tv-plus-player,apple-music-uploaded-content,apple-music-video-player,apple-tv-plus-preview,amp-chrome-player,amp-footer-player,apple-music-card-player,amp-lyrics,amp-episode-list-control,amp-next-episode-control,apple-music-artwork-lockup,apple-music-progress,apple-music-volume,amp-playback-controls-autoplay,apple-music-radio-column-player,amp-ambient-video,amp-background-video,amp-footer-player-metadata,amp-footer-player-progress,amp-launch-button,amp-mediakit-root,amp-playback-controls-pip,amp-text-multiline,amp-transition,state-holder,\”></div>\n …”

   }

Notice that this API response contains the following main fields for the root level:

  • date: date of request
  • httpCode: the HTTP response code
  • status: the status of the task being processed
  • url: the page URL
  • result: This field has the most useful scraped data, such as nested JSON-objects

The result-field contains three main child fields:

  • xPathElements, which contains the data gathered for each provided XPath
  • selectorElements, which has the data for each provided CSS selector
  • Html, which is the full HTML code of the page

Since our goal is to scrape a list of track names using XPath from music.apple.com, we will be focusing on xPathElements and selectorElements. Each of these elements is associated with a single XPath/selector.

For instance, the segment below is a single XPath/selector:

   “xpath”: “//div[@class=\”songs-list-row__song-name\”]”,

   “textNodes”: [

       “Black”,

       “Alive”,

       “Even Flow”,

       “Jeremy”,

       “Yellow Ledbetter”

   ],

   “htmlElements”: [

       “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Black<!–%-b:32%–></div>”,

       “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Alive<!–%-b:32%–></div>”,

       “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Even Flow<!–%-b:32%–></div>”,

       “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Jeremy<!–%-b:32%–></div>”,

       “<div tabindex=\”-1\” role=\”checkbox\” dir=\”auto\” aria-checked=\”false\” class=\”songs-list-row__song-name\”><!–%+b:32%–>Yellow Ledbetter<!–%-b:32%–></div>”

    ]

}

These segments contain three child fields:

  • Xpath, which is a string containing the XPath/selector
  • textNodes, which contains the text entries found within all the elements the API has found for this XPath/selector
  • htmlElements, which is the HTML code for all of the elements found within this particular XPath/selector

Transforming XPath/Selectors to tabular forms for Excel or Google Sheets 

Now, let’s transform this JSON response into a tabular form in Excel or Google Sheets. 

For Excel, follow these steps:

  1. Open a new spreadsheet.
  2. Click on the Data tab.
  3. Click “Get Data” on the far right and select “From File.” Then, click the applicable option. For instance, if scraping API’s results are in .JSON format, select “From JSON.”
  4. Your Excel sheet should now be populated.
  5. If you have other data associated with other XPath or selectors, add them to the table using the same method.

Follow these steps for Google Sheets:

  1. Open a new spreadsheet.
  2. Go to the “File” and select “Import.”
  3. You will now see a window with four tabs: “My Drive,” “Shared with me,” “Recent,” and “Upload.” Click on “Upload” and pick the file you want to convert into a table. Make sure the checkbox “Convert text to numbers, dates, and formulas” is unselected. 
  4. Pick your import location and separator type and click on the “Import data” button. 

If you want to add metadata related to this project to the table, add them to the table. At this point, you’re almost done. All you have to do is add in the table headers so you know what’s what.

Conclusion

Conclusion

Learning how to pull data from a website into Excel or Google Sheets seems difficult, but with powerful scraping APIs like Scraping Robot’s free API, you’ll be able to extract, aggregate, and manipulate data with just a few clicks of your mouse.

Scraping Robot API can be used with any scraping program, but if you’re new to scraping or you want to try out a new scraping tool, consider getting Scraping Robot.

Scraping Robot allows you to scrape websites into JSON—no browser scaling, proxy management, blocks, or captchas required. Our system was built for developers and allows you to use our API to extract and aggregate data within minutes.

We provide the following features and more:

  • Automatic metadata parsing
  • Javascript rendering
  • No proxies required—we will handle this for you
  • Stats and usage
  • Guaranteed successful results
  • Session management (coming soon)
  • Webhook callbacks (coming soon)
  • POST requests (coming soon)
  • Output to Sheets or Zapier (coming soon)
  • Screenshots of browser (coming soon)

What’s more, there’s no monthly commitment. You can start with us for free—we offer 5000 free scrapes with all features included. If you need more scrapes, you can then move to the Business and Enterprise tiers, which offer hundreds of thousands of scrapes at extremely affordable prices.

Interested or have questions about Scraping Robot? Contact us today to learn more. Our support team is available 24/7.

The information contained within this article, including information posted by official staff, guest-submitted material, message board postings, or other third-party material is presented solely for the purposes of education and furtherance of the knowledge of the reader. All trademarks used in this publication are hereby acknowledged as the property of their respective owners.