Data Driven Journalism: How To Scrape Journalism Data

Hannah Benson
December 15, 2020
Community

Growing up, I dreamed of being a writer. During college, that dream became more focused on cultural journalism and criticism. Now with around 10 pieces of film criticism published, I finally feel like a real journalist. As much as I love the writing aspect of journalism, the research can be taxing at times. No matter what kind of stories you are reporting on, people depend on journalists to gather and synthesize tons of information into a single story. Thankfully, the internet gives journalists more information and data to work with than ever before. While this is advantageous in many ways, it also makes for an overwhelming amount of work. With web scraping, journalists can engage in data driven journalism to help strengthen their stories and credibility.

If you already know the basics of data journalism, then use the table of contents below to jump ahead and discover the benefits of using web scraping to organize data sets for stories.

Table of Contents

What Is Data Driven Journalism?

What Is Data Driven Journalism?

Before diving into data driven journalism, we must first understand journalism itself. While there are many different types of journalism (political, cultural, opinion, etc.), most journalism is the process of gathering information on a specific topic or story and presenting it in an organized fashion for public consumption. Data journalism or data driven journalism is the process of gathering, organizing, and interpreting large data sets to either explain an existing news story or find a story within the data itself. This can range from gathering data sets on housing prices in order to evaluate the state of the market to analyzing merch sales for a K-Pop group to understand their fan base.

How To Do Data Journalism

How To Do Data Journalism

Extract data

When starting any journalistic project, it can be hard to come up with ideas. Once you decide on your data journalism projects, it is time to gather data. The data journalism tool you need right now is web scraping. Web scraping is the quick and easy process of extracting and organizing data from webpages. Within journalism, this can range from tracking the prices of commodities, analyzing official census and government data, and following the chart success of your favorite singer. Once you use a web scraper to extract this data, it then organizes the information into a spreadsheet that is easy to download and share. When it comes to extracting data for stories, web scraping is the easiest way forward.

Organize and interpret

Again, a web scraper will organize unstructured data into structured data. the information for you in the scraping process. This leaves you with the fun task of interpreting the data. Because web scraping helps organize unstructured data sets, it becomes easier to spot trends and patterns within the data. Certain data journalists use data to strengthen an existing story or opinion while other data journalists use the trends in significant data sets as the story itself. For example, if the data is trending towards a housing crisis, then that could be your story. But if you are trying to speak to the success of a film, box office data can help strengthen your argument. Therefore, data sets can be incorporated into stories of all shapes and sizes. Read this interview with economics reporter Ben Casselman to learn how he uses data sets to develop deeply human stories.

Write and publish story

Once you gather all this crucial data and find patterns within it, you can now begin to write and publish your story. However, your use of web scrapers doesn’t end here. As a journalist, you want to build a group of readers that trust and support your work. There are many ways to use web scraping to better understand your audience and what stories they might be interested in. Scraping social media profiles in particular can be helpful for journalists to better understand their readers.

Track story and audience response

In addition to understanding your audience, web scraping can be used to measure the success of a story. Scraping comment sections and other data can help you grasp if your piece was a success or not. While it may seem that scraping is only useful in the early days of data journalism projects, web scraping can help you from the initial concept all the way past publication day.

Why Is Journalism Data Useful?

Why Is Journalism Data Useful?

Gain credibility

A journalist is only as strong as their stories. Picking and pitching relevant topics is the key to success. Data journalism tools like web scraping can help you find good ideas and data sets to work with. By adding data analysis to your stories, you will gain credibility with readers and gain trust. Using data to make your story stronger or find the story helps readers understand your dedication to journalism and using facts to back up opinion pieces or political analysis. In a time of constant punditry, giving your readers a well-researched and data backed story will set you up for success.

Find insights and patterns

As mentioned above, analyzing large data sets as a journalist gives you insights and helps you identify patterns that can be easily lost when you try parsing through data on your own. Web scraping is advantageous in this case because data organization is part of the scraping process. Data driven journalism is time consuming without the proper scraping support. By using a data journalism tool like a scraper, you save yourself time and energy and have more time to focus on using the data in a creative way.

Increase transparency and access

In a world of so much data, it can be overwhelming for the average person. In general, journalists are supposed to help everyday citizens understand the political and cultural issues present in our daily lives. Most of my fellow journalism students desire to work in journalism because of the power accurate information gives citizens in times of confusion. Additionally, by using journalism to understand and share big data sets, you can help people understand data that is meant to be confusing for the average person. By helping raise data literacy among citizens, you are expanding transparency and access to information among citizens.

Discover past research

While there is lots of focus on how data and web scraping are moving us into the future, scraping is an essential tool for digging into the past. Web scraping makes it easier to find material in archives, whether it’s from census data or old newspaper articles. Finding and including data from the past helps you better understand how current data sets developed. To learn more about how web scraping can help you scan for old newspaper articles, read here.

Save time and energy

As someone who constantly pitches stories to editors, I know the toll of freelancing and finding stories. With all that preexisting stress, you do not need the additional stress of manually parsing through big data sets. Web scraping large amounts of data helps you save time during the research phase and gives you more freedom to interpret the data and find creative ways of presenting it.

The Best Way To Collect Data For Your Next Story

The Best Way To Collect Data For Your Next Story

Gather custom data for unique stories

While Scraping Robot’s modules are great, they also have their limitations. If you need to do mass amounts of scraping or gather more specific information than a classic scraper allows, a custom scraping solution is your best way forward. In addition to being tailored to your unique scraping needs, creating a custom scraping solution involves a team at Scraping Robot working to help you. Instead of spending time hiring developers and managing proxies, the Scraping Robot team does this for you and helps you scrape at a larger scale than the modules allow. Especially if you need to do tons and tons of scraping, creating a custom scraping solution is a better deal than buying tons of individual scrapes. If the custom approach sounds like what you need, contact us here.

Find your audience

Myself and other journalists I know tend to get story ideas from trending topics on social media sites. However, sometimes it can be hard to tell whether the trends you see are just your bubble or a larger societal trend. Scraping social media sites can help you determine the reach of your data journalism ideas because you can gauge whether those who follow you are actively interested in a topic. By extracting data on followers, content, and more, you can find inspiration from social media sites that help you think about larger trends and what data sets to look for.

Make use of public records

Scraping government databases or other public records is an essential source of data for journalists. While many people know there is tons of public data available, the overwhelming amount of information can be hard for most people to grasp. By using an HTML scraper to scrape public records, sports data, or other aggregation sites, you can use free sources to inspire and solidify your projects.

discover places

With Google, you can scrape the top places and page results for a given keyword or location. While this data might not be the center of a story, understanding the responses a certain keyword yields can help you find more sources or and grasp how certain words are perceived by search algorithms.

Conclusion

Journalists are essential for the public’s understanding of political issues, shifts in society, and understanding the culture. While big data and journalism might seem at odds, this could not be further from the truth. Big data journalism helps contextualize large data sets for citizens and brings credibility to articles. Web scraping makes this process easier for journalists, especially those without a tech background, to do thorough research while still giving them the time to focus on writing and communicating their ideas. Data driven journalism better helps us understand our current situation and where the world might be heading.

The information contained within this article, including information posted by official staff, guest-submitted material, message board postings, or other third-party material is presented solely for the purposes of education and furtherance of the knowledge of the reader. All trademarks used in this publication are hereby acknowledged as the property of their respective owners.