How to Scrape Data from a Website into Google Sheets
Manually copying and pasting data from a website into a spreadsheet is a tedious and soul-crushing task. Luckily, you can automate this entire process by scraping data directly from a website into Google Sheets. This article will show you several ways to do just that, from simple built-in formulas to more advanced techniques for tackling tricky websites.
What is Web Scraping? A Quick Refresher
Web scraping is simply the process of automatically extracting information from websites. Instead of you copying data line by line, a script or tool does it for you, saving an incredible amount of time and effort. Businesses use it for all sorts of reasons, including:
Monitoring prices: E-commerce stores can track competitor pricing to stay competitive.
Generating leads: Sales teams can collect contact information from online directories or social media platforms.
Market research: Analysts can aggregate product reviews, news articles, or social sentiment to understand market trends.
Content aggregation: You could build a dashboard that pulls the latest headlines from your favorite news sites every morning.
Once the data is in Google Sheets, you can sort, filter, analyze, and visualize it however you like. Let's look at how to get it there.
Method 1: The Easiest Way with IMPORTHTML
If the data you need is in a simple table or list on a webpage, Google Sheets has a magical built-in function called IMPORTHTML that can grab it for you in seconds. It’s perfect for scraping data from structured pages like Wikipedia articles or simple financial websites.
The syntax for the formula is:
=IMPORTHTML(url, query, index)
url: The full URL of the webpage you want to scrape, including "http://" or "https://".
query: This tells Google what kind of element you're looking for. Your options are either "table" or "list".
index: This is a number indicating which table or list on the page you want to pull from. The first table or list is 1, the second is 2, and so on.
Step-by-Step Example using IMPORTHTML
Let's say we want to scrape a list of the most populous countries from Wikipedia. The data is nicely organized in a table, making it a perfect candidate for this function.
Find the URL of the page. In this case, it's:
https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_populationOpen a new Google Sheet.
Click into cell A1 and type the following formula:
=IMPORTHTML("https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population", "table", 1)
Press Enter, and in a few moments, the entire table from the Wikipedia page will populate your spreadsheet. It's that easy.
Heads up: The biggest limitation of IMPORTHTML is its simplicity. It only works on basic HTML tables and lists and fails on websites that dynamically load content with JavaScript or require you to log in.
Method 2: A More Powerful Option with IMPORTXML
When IMPORTHTML isn't flexible enough, IMPORTXML steps in. This function is more versatile because it uses something called "XPath" to pinpoint exactly which data you want from a page, even if it's not in a neatly formatted table. Think of XPath as a specific address for any piece of information on a webpage.
The syntax is:
=IMPORTXML(url, xpath_query)
url: The URL of the webpage.
xpath_query: A query written in XPath language to select specific data from the page's HTML or XML source.
How to Find the XPath Query
Finding the XPath sounds technical, but it’s surprisingly straightforward using your browser's developer tools.
Navigate to the webpage you want to scrape.
Right-click on the data element you want to extract (e.g., a specific headline, a price, or a product title).
Select "Inspect" or "Inspect Element." This will open up the developer tools window and highlight the corresponding HTML code.
Right-click on the highlighted HTML element in the developer tools.
Navigate to "Copy" -> "Copy XPath."
Step-by-Step Example using IMPORTXML
Let's try to scrape the titles of the featured posts from Google's official blog.
Go to
https://blog.google/.Right-click on one of the main article headlines, click "Inspect," right-click the highlighted code, and copy the XPath. Your XPath query might look something like this:
"//h2"(This simple query will grab all "h2" header tags on the page).Open a Google Sheet and enter the formula in cell A1:
=IMPORTXML("https://blog.google/", "//h2")
Press Enter, and Google Sheets will pull all the text from every h2 heading on that page and list them down the column.
Heads up: IMPORTXML shares some limitations with IMPORTHTML. It can't handle content loaded via JavaScript. Also, if the website's structure changes, your XPath query might break, and you'll have to find the new one.
Method 3: For Complex Jobs, Use Google Apps Script
When the built-in formulas don't cut it, it's time to bring in the heavy machinery: Google Apps Script. This is a cloud-based scripting language based on JavaScript that lets you extend the capabilities of Google Sheets (and other Google Workspace apps).
This method requires some light coding but grants you tremendous power and flexibility. You can scrape dynamic websites, interact with APIs, handle login states, and even set your script to run on a schedule to fetch fresh data automatically.
Simple Apps Script Example
Here’s how to create a basic script to pull headlines from a website.
In your Google Sheet, go to Extensions > Apps Script. This will open a new tab with the script editor.
Delete any placeholder code and paste in the following script. This script fetches the entire HTML content of a page and uses a regular expression to pull out all the text content within
<h2>tags.
`function scrapeHeadlines() { const url = 'https://blog.google/' const response = UrlFetchApp.fetch(url, {muteHttpExceptions: true}) const htmlContent = response.getContentText()
// This Regular Expression finds everything between tags const headlineRegex = /<h2.?>(.?)</h2>/g let matches const headlines = []
while ((matches = headlineRegex.exec(htmlContent)) !== null) { // This is necessary to avoid infinite loops with zero-width matches if (matches.index === headlineRegex.lastIndex) { headlineRegex.lastIndex++ } // matches[1] contains the actual text inside the h2 tags. We remove any html tags from inside. const headlineText = matches[1].replace(/<[^>]+>/g, "").trim() headlines.push([headlineText]) // Add as an array for writing to sheet }
if (headlines.length > 0) { const sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet() // Clear previous content sheet.getRange(1, 1, sheet.getLastRow(), 1).clearContent() // Write new headlines starting at cell A1 sheet.getRange(1, 1, headlines.length, 1).setValues(headlines) } else { Logger.log("No headlines found.") } }`
Click the "Save project" icon (the floppy disk).
To run your script, select the function name (scrapeHeadlines) from the dropdown menu at the top and click "Run".
The first time you run it, Google will ask you to authorize the script. Follow the prompts to grant permission.
When the script finishes, go back to your spreadsheet. You'll find all the scraped headlines populated in column A. You can even set up "Triggers" (from the clock icon in the Apps Script editor) to run this script automatically every day or every hour.
Method 4: Use Third-Party Tools & Browser Extensions
If coding isn't your thing, you can use specialized tools and extensions designed for web scraping. These often provide a visual, point-and-click interface that does the hard work for you.
Browser Extensions: Tools like Web Scraper or Data Miner are popular Chrome extensions. You install them, navigate to the target website, and use a simple interface to select the data you want. Once you build your "recipe" or "sitemap," the tool can scrape the data and then let you download it as a CSV file, which you can easily import into Google Sheets.
Automation Platforms: Services like Zapier and Make.com also have web scraping components. You can build a workflow that scrapes a website on a schedule and then automatically adds the new data as a row in a specific Google Sheet. This is great for an entirely hands-off, automated process.
Final Thoughts
As you can see, there's more than one way to get data from a website into Google Sheets. Starting with simple formulas like IMPORTHTML is great for basic tables, while IMPORTXML and Apps Script provide the power you need for more complex pages. The right method really just depends on your data source and how comfortable you are with different tools.
Of course, getting data into a spreadsheet is only the beginning. The real goal is to find actionable insights that help you make better decisions. Sometimes that data lives in Google Sheets, but often it needs to be joined with data from other platforms like Google Analytics, Facebook Ads, or your CRM. At Graphed, we help you connect all your data sources - including Google Sheets - in one place. Instead of jumping between tabs and wrestling with formulas to see the full picture, you can use plain English to ask questions, build dashboards, and get instant answers pulled from all your connected accounts in real time.